Skip to main content
Internet Archive's 25th Anniversary Logo

Full text of "Business Statistics"

See other formats


DO 

m< OU_1 60390 >m 

CD 



OSMANIA UNIVERSITY LIBRARY 

%. I, > * / f? C $ I* 

Gall No. * 3> * * * Q v 7 $ Accession No 

Authoi 
fide 
This book should be returned on or before the date last marked below. 



\ '^v,*A >T , ^< . ^ \ J\ - 

A* , !' 

> v ^i^- 1 "- * *' A 



BUSINESS 
STATISTICS 

by 

MARTIN A. BRUMBAUGH, Ph.D. 

Professor of Statistics, University of Buffalo 

and 
LESTER S. KELLOGG, M.A. 

Assistant Professor of Statistical Research 
Ohio State University 

with the collaboration of 
IRENE J. GRAHAM, M.A. 

Laboratory Assistant t Department of Statistics 
University of Buffalo 




1950 

RICHARD D. IRWIN INC 

CHICAGO 



COPYRIGHT 1941 BY RICHARD D. IRWIN . INC. 

ALL RIGHTS RESERVED THIS BOOK OR ANY PART THEREOF MAY NOT 
BE REPRODUCED WITHOUT THE PERMISSION OF THE PUBLISHER 



FIRST EDITION 

First Printing September, 1941 

Second Printing August, 1942 

Third Printing February. 1946 

Fourth Printing August, 1946 

Fifth Printing November. 1946 

Sixth Printing May, 1947 

Seventh Printing October, 1947 

Eighth Printing February, 1949 

Ninth Punting September, 1949 

Tenth Printing ... . October, W>() 



PRINTED IN THB UNITED STATES OF AMERICA 



PREFACE 

A" STATISTICAL METHODS have been gradually expanded 
in recent years, textbook writers on the subject have exhibited 
a noticeable tendency to increase the amount of advanced 
material at the expense of the elementary content. The authors of 
this book hold to the opinion that the first requisite of an adequate 
structure is a sound foundation. Pursuant to this point of view they 
have attempted to place unusual emphasis upon the elementary or 
foundation methods of the subject. No attempt has been made to 
attain the research frontiers of any phase of statistical analysis. In 
short the aim is to present statistical materials and methods that are 
in everyday use in the conduct of business affairs. 

The readers of statistical texts might be divided into two broad 
groups: those who will compile, analyze, and interpret statistical data 
and those who will be the users of the results prepared by the first 
group. The latter readers comprise much the larger group and for 
their use either as students or business practitioners this book provides 
the essentials of method and the mental conditioning so necessary for 
effective "statistical consumption." The training requirements of the 
first group, the statistical producers, differ somewhat according to the 
level at which they engage in statistical work. Those few who are 
conducting advanced research work have little need Jar This book. 
The larger number who either now or in the future contemplate engag- 
ing in the usual type of statistical collection, analysis, and presentation 
carried on within business concerns, statistical organizations, and gov- 
ernmental agencies will find that the contents of this book provide 
sound guidance. 

The student will quickly discover that the methods of statistics form 
a related whole; that the division into chapters is for convenience in 
the classroom rather than for the separation of disconnected subject 
matter. The structure is built literally method upon method like the 
bricks in a wall and, to carry the analogy a step farther, the binding 
mortar is reasoning rather than memory. The student who attempts 
to acquire statistical knowledge solely by memorizing rules and for- 
mulas invariably fails to develop the power to apply his skill to prac- 
tical problems. On the other hand the student who approaches the 



rr PREFACE 

subject with an eternal "Why?" and insists on having his curiosity 
satisfied has a much better chance of developing the power needed to 
solve new problems as they arise. 

This distinction of attitude is exemplified by a criticism often leveled 
at the writers of statistical texts: "If they would only tell us what 
techniques to use in analyzing different kinds of data they could save 
so much time and take most of the mystery out of the subject." If the 
quoted suggestion could be put into effect, statistical practice would in 
truth be reduced to a routine level. But the case is not so simple 
because the type of analysis that should be applied to a particular set 
of data depends entirely on the purpose of the analysis, the specific 
use that will be made of the results, the time and funds that are 
available, and other related considerations. Therefore neither in this 
book nor in any other will rules be found relating methods of analysis 
to data on specific subjects. The function of the statistician must always 
be the exercise of judgment as to the proper methods to employ in the 
investigation of a given problem. 

From the standpoint of mathematics, this book assumes only that 
the reader has proficiency in arithmetic and enough knowledge of 
algebraic symbolism to be able to substitute values in a formula. Even 
this modest assumption is partially reinforced by the introduction early 
in the book of a chapter entitled "The Use of Numbers," the content 
of which is partially a review of arithmetic. The explanations in the 
book assume that the reader possesses some familiarity with economics 
and understands in an elementary way the organization and functioning 
of individual business concerns. A knowledge of accounting principles, 
marketing principles, and recent business history including the relation 
of government to business provides a desirable although less essential 
background. 

The subject matter of this book can be covered in a ninety-hour 
course. By reducing the intensity of coverage somewhat, the entire 
content can be included in a sixty-hour course. For briefer courses 
some chapters will undoubtedly have to be omitted entirely or in part. 
It is difficult to specify which chapters might be omitted, since each 
case requires a knowledge of the point of view of the instructor, the 
capabilities of the particular group of students, and the purpose of 
the statistics course in the curriculum. 

The problems appearing at the end of each chapter have been so 
Banned that the student who prepares answers to all or a large part 



PREFACE T 

of them will be forced to apply the important principles of the subject 
to situations akin to those found in statistical practice. The authors 
hold steadfastly to the opinion that effective teaching of statistics 
requires the liberal use of problems to insure facility in computation, 
to provide constant practice in selecting and adapting methods to 
particular situations, to develop proficiency in the interpretation of the 
results of investigation, and to encourage accurate reporting of com- 
pleted analyses. Many of the problems have been reduced in size to 
accommodate them to the needs of students, but an attempt has been 
made to avoid simplification to the point of absurdity. 

A list of references is appended to each chapter. These lists are 
intended to be selective rather than comprehensive. The publications 
named are included because they contain material which supplements 
the development in the text or because they offer the opportunity for 
either more intensive or more advanced study of the subject matter in 
the text. The reference lists are not intended to set forth the sources 
from which assistance was drawn for this book. Wherever such assist- 
ance is specific, direct footnote reference has been made to the source 
and the author's permission for use has been secured. For aid in the 
broader sense the authors are permanently indebted to writers in the 
field of statistics in so many directions that anything beyond general 
acknowledgment of the obligation would be impossible. If, by some 
mischance, the authors have failed to acknowledge materials reproduced 
from other writers, such omission is wholly unintentional. 

It is impossible to make a complete acknowledgment to Miss Irene 
Graham, who has collaborated with the authors in the preparation 
of this text. Her most outstanding work was the writing of chapters 
XIII and XIV and major revision of chapter XI. She has also con- 
tributed materially to the text of chapters IV to X, XV, XVII, XIX, 
and XXX, in addition to research, criticism, reorganization and editing 
of all chapters. 

Dr. Robert Riegel, professor of statistics, University of Buffalo, has 
read and criticized the manuscript in various stages of preparation. His 
suggestions concerning statistical soundness and pedagogical desir- 
ability represent a contribution that the authors acknowledge gratefully. 

Chapter XXVI has been made possible through the co-operation of 
Mr. A. H. Robinson, assistant-treasurer, and Mr. Lawrence M. Tarnow, 
head of the planning department of Eastman Kodak Company of 
Rochester, New York. To them for their assistance, and to the Eastman 



vi PREFACE 

Kodak Company for permission to use the material, the authors express 
their sincere thanks. 

Most of the graphs were prepared by Mr. Ralph Lownie, a student 
in the School of Business Administration, University of Buffalo. Their 
quality is due entirely to his craftmanship. The remaining graphs were 
drawn by Mrs. Dorothy Tallman, who also shared with Mrs. Ruth 
Carroll the extremely laborious task of typing the manuscript. 

M. A. BRUMBAUGH 
LESTER S. KELLOGG 
September, 1941 



TABLE OF CONTENTS 

(.11 AFTER PAGE 

I. STATISTICS IN BUSINESS l 

The Statistical Approach. The Work of the Statistician. 

II. THE USE OF NUMBERS 13 

Introduction. The Fundamental Operations. Fractions. Square Roots. 
Accuracy of Statistical Data. Summary. 

III. STATISTICAL INVESTIGATION 43 

The Character of Statistical Investigation. The Canons of Statistical 
Investigation. Steps in Statistical Investigation. The Scope of Dif- 
ferent Investigations. Summary. 

IV. PRELIMINARY PLANNING OF INVESTIGATIONS 53 

Introduction. Define the Problem. Study the Problem. Plan the 
Procedure. Prepare a Statement of the Program. 

(v^ SAMPLING 69 

Relation to Knowledge. The Importance of Sampling. The Prin- 
ciple of Statistical Regularity. The Two Problems of Sampling. 

s ^^ 

yl.y COLLECTION OF DATA DIRECT SOURCES 92 

Description of Direct Sources. Collecting Data From Direct Sources. 
Summary. 

VII. EDITING AND PRELIMINARY TABULATION 118 

Editing Schedules. Preliminary Tabulation. 

VIII. TABULATION 144 

Definitions. Types of Tables. Established Practice in the Construc- 
tion of Tables. Tabular Forms. 

IX. CLASSIFICATION OF LIBRARY SOURCES 186 

The Meaning of Collection From Library Sources. Methods of 
Classifying Sources. Appendix A Selected Sources Listed According 
to Publishing Agency, Title, Frequency of Publication, and Contents. 

X. THE USE OF LIBRARY SOURCES % .... 210 

Introduction. Finding a Good Source. The Correct Use of Data. 
Appendix B Examples of Search for Data in Libraries. 

XI. RATIOS 229 

The Importance of Ratios in Statistics. Construction of Statistical 
Ratios. Presentation of Ratios. Comparisons Between Ratios. 



viii TABLE OF CONTENTS 

CHAPTER PAGE 

XII. APPLICATIONS OF RATIOS 274 

Refined Ratios. Compound Ratios. Examples of the Use of Ratios 
in Business. 

XIII. GRAPHS 298 

Introduction. Simple Types of Graphs Methods and Purposes of 
Each. Introduction to Two-Dimensional Linear Graphs. 

XIV. GRAPHS (continued) 323 

Time Series Graphs. Planning Graphs for General Effect. 

\ 

XV. FREQUENCY DISTRIBUTIONS AND GRAPHS 350 

Frequency Distributions. Graphs of Frequency Distributions. 

XVIJ MEASURES OF CENTRAL TENDENCY AVERAGES OF CALCULA- 
TION 387 

Introduction. The Arithmetic Average. The Geometric Average. 

XVII. MEASURES OF CENTRAL TENDENCY AVERAGES OF POSITION 413 

The Median. The Mode. Criteria for Selecting and Judging Aver- 
ages. 

XVIII. MEASURES OF DISPERSION AND SKEWNESS 436 

Introduction. Dispersion. Skewness. Uses of Measures of Disper- 
sion and Skewness. 

XIX. INDEX NUMBERS 461 

Introduction. Kinds of Index Numbers. Basic Methods of Con- 
structing Index Numbers. Symbols. Unweighted Formulas. Weighted 
Formulas. Problems of Index Number Construction. Tests of Index 
Numbers. Specific Uses of Index Numbers and Their Interpretation. 

XX. SOME COMMONLY USED INDEXES 506 

Introduction. An Index of Cost of Living. An Index of Industrial 
Production. An Index of Employment. A Wholesale Price Index. 
A Local Business Index. 

XXI. ANALYSIS OF TIME SERIES 537 

A Change in Emphasis. Components of Time Series. The Problem 
of Time Series Analysis. Preliminary Analysis. 

XXII. TREND 560 

The Location of Trend. Methods of Measuring Trend. Why Trend 
is Measured. Summary. 

XXIII. SEASONAL AND CYCLICAL MOVEMENTS 592 

Introduction. The Nature of Seasonal Variation. The Concept of a 
Seasonal Pattern. Methods of Measuring Seasonal Variation. The 
Cyclical Remainders. Special Seasonal Problems. 



TABLE OF CONTENTS ix 

CHAPTER PAGE 

XXIV. SUMMARY OF THE ANALYSIS OF TIME SERIES (An Example) 627 

Adjustment of Calendar Variation. Adjustment of Changes in the 
Price Level. Adjustment of Seasonal Variation. Adjustment of Trend 
Moving-Trend Method. The Cyclical Fluctuations. 

XXV. INDEXES OF BUSINESS CONDITIONS 649 

Need for External Information. Two Types of Business Indicators. 
Construction of Composite Indexes. Examples of the Construction of 
Composite Indexes. Local Indexes. Use of Business Indexes. 

XXVI. INTERNAL APPLICATION OF TIME-SERIES ANALYSIS PRODUC- 
TION PLANNING 682 

Introduction. General Discussion of Production Planning. Two 
Examples of Planning. Summary. 

XXVII. CORRELATION 704 

Introduction. Scattergram. The Regression Line. The Standard Error 
of Estimate. The Coefficient of Correlation. Some Deferred Points. 
The Rank Difference Measure of Correlation. Correlation of Time 
Series. Recapitulation of Formulas. 

XXVIII. THE NORMAL CURVE 752 

Probability. Binomial Distribution. The Normal Curve. Other 
Types of Distributions. Goodness of Fit Chi-Square Test. 

/ 

' XXIX. PRINCIPLES OF SAMPLING AND TESTS OF SIGNIFICANCE . . 785 

Introduction. The Basis of Sampling. Tests of Significance. Small 
Samples. Variance Analysis. 

XXX. PRESENTATION OF THE RESULTS OF STATISTICAL INVESTIGA- 
TION 826 

Introduction. The Writer Reader Relation. Requirements of a 
Report. The Form of a Report. 

APPENDIX A SELECTED SOURCES LISTED ACCORDING TO 
PUBLISHING AGENCY, TITLE, FREQUENCY OF PUBLICATION, 
AND CONTENTS 197 

APPENDIX B EXAMPLES OF SEARCH FOR DATA IN LIBRARIES 224 

APPENDIX C LOGARITHMS OF NUMBERS 845 

FIVE PLACE TABLE OF LOGARITHMS . . . 857 

APPENDIX D TABLE OF SQUARES, SQUARE ROOTS AND RE- 
CIPROCALS 875 

APPENDIX E TABLE OF ORDINATES 895 

APPENDIX F TABLE OF AREAS 896 



FIGURES 

FIGURE PAGE 

1. Two Examples of the Development by Successive Steps of the 

Solution for Extracting Square Root 24-25 

2. Schedule Used in a Real Estate Survey 95 

3. Proposed Revision of Real Estate Schedule, Figure 2 .... 96 

4. Questionnaire Used in Worsted Spinning Spindle Inventory . . 101 

5. Radio Section of Questionnaire Used in Surveying the College 

Market 105 

6. Letter with Appeal for Reply Based on Co-operation 106 

7. Letter with Appeal for Reply Based on Interest 107 

8. Letter with Appeal for Reply Based on Profit 107 

9. Letter with Appeal for Reply Based on Obligation 108 

10. Letter with Appeal for Reply Based on Position 108 

11. Letter Based on Compulsion 109 

12. Example of Instructions to Collecting Agents 110 

13. Collection Card Used in Residential Vacancy Investigation in 

Buffalo, New York 123 

14. Tally Sheet for Recording Residential Vacancy in Buffalo, New 

York 126 

15. Proposed Work Sheet for Questionnaire Used in College Market 

Investigation, Figure 5 128 

16. Punched Cards for Mechanical Tabulation 132 

17. Schedule Used in Collecting Data for the President's Conference 

on Home Building and Home Ownership, 1930 134 

18. Instructions for Coding Data on Home Building and Home Owner- 

ship in Buffalo, New York, 1930 135 

*19. Code Sheet Used in Transferring Information from Questionnaire, 

Figure 17 138 

20. Reproduction of the Printed Record from the Tabulating Machine 

with Headings Added 140 

21. Form for Two- Way Cross-Classification 157 

22. Form for Three- Way Cross-Classification 157 

23. Form for Four- Way Cross-Classification 158 

24. Form for Five- Way Cross-Classification 159 

25. Reproduction of Department of Agriculture Form C E. 1-128 . . 171 

26. Reproduction of Department of Agriculture Long-Term Blank . . 172 

27. Reproduction of Department of Agriculture Form C E. 1-139 . . 173 

xi 



xii FIGURES 

FIGURE PAGE 

28. Eastman Kodak Co. Form Comparison of Sales by Divisions . . 175 

29. Eastman Kodak Co. Form Lost Time Report 176-177 

30. Instructions on the Reverse Side of Lost Time Report . . . 178-179 

31. Eastman Kodak Co. Form Labor Turnover Report 181 

32. Organization Chart of the Government of the United States 

between 192-193 

33. Classification of Comparisons Between Ratios of Like Items with 

Examples of Each 256 

34. Classification of Comparisons Between Ratios of Unlike Items 

with Examples of Each 264 

35. Types of Graphs 299 

36. Dot Maps: Filling Stations in the United States, 1935 .... 305 

37. Cross-Hatched Ratio Map: Filling Stations per 10,000 Persons in 

the United States, 1935 308 

38. Flow Map: United States Exports, 1931 311 

39. Dial Chart: Index of Industrial Activity as of May 31, 1941 . . 313 

40. Pictogram: Number of Workers in Basic Fields of Employment, 

1940 314 

41. Types of Bar Graphs 316 

42. Bar Graph of Time Series 323 

43. Band Graphs of Time Series: Per Cents and Dollar Values . . 325 

44. Line Graphs of Time Series 328-329 

45. Construction of the Ratio Scale 333 

46. Curves Showing Changing Relative Rates on a Ratio Scale . . 336 

47. Types of Lines 341 

48. Methods of Plotting Time Periods 345 

49. Tally of Monthly Rents Paid by 155 Families in a Consumer Survey 

in Columbus, Ohio 353 

50. Array of Rents Paid by 155 Families in Columbus, Ohio . . . 354 

51. Methods of Designating Class Limits 361 

52. Two Types of Frequency Diagram of Rent Data 367 

53. Frequency Diagrams of Discrete Data. Number of Dresses Sold in 

Junior Sizes 370 

54. Ogives: Cumulative Frequency Diagram of Rent Data . . . . 373 

55. Frequency Diagrams of Hourly Wage Rates Paid by Fifty-two 

Industrial Concerns 376 

56. Per Cent Comparison of Two Distributions of Rent Data ... 378 

57. Types of Curves 380 

58. Lorenz Curves: Cumulative Per Cents of Stores and Sales, Inde- 

pendent Retail Grocery Stores in Buffalo, 1929 and 1935 . 383 



FIGURES xiii 

FIGURE PAGE 

59. Location of the Median in an Array 414 

60. Location of the Median in a Frequency Distribution .... 416 

61. Summary of Characteristics of Measures of Central Tendency . 432 

62. Guide to the Suitability of Measures of Central Tendency Accord- 

ing to the Condition of the Data 432 

63. Comparison of Columbus Rentals with Normal Distribution Ac- 

cording to Measures of Dispersion 449 

64. Fractions of the Area of the Normal Curve Measured by the 

Standard Deviation 451 

65. Summary of Criteria of Measures of Dispersion 454 

66. Sources of Commonly Used Index Numbers 465 

67. Nomograph for Reading Per Cents of Increase or Decrease in 

Index Numbers 499 

68. National Industrial Conference Board Index of Cost of Living, 

Monthly, 1923-40 514 

69. Federal Reserve Board Index of Industrial Production, Monthly, 

1919-40 521 

70. Indexes of Employment and Pay Rolls, Monthly, 1919-40 . . 526 

71. Wholesale Price Indexes of National Fertilizer Association and 

United States Bureau of Labor Statistics; Annually, 1929-35; 
Monthly, 1936-39; Weekly, January 1940-April 1941 ... 531 

72. Index of Bank Debits in Canton, Ohio, Monthly, 1926-40 . . 535 

73. Production of Wheat, Passenger Automobiles and Anthracite Coal 

in the United States, 1900-1937, and Free Hand Trend for 
Each Series 540 

74. Consumption of Raw Cotton in the United States 1913-37 . . 543 

75. Monthly Sales of F. W. Woolworth Company, 1930-37 ... 546 

76. Daily Net Currency Movement in New York City to or from the 

Federal Reserve Bank of New York, April to September, 1926 548 

77. Monthly Totals and Daily Averages for 1936 for Three Sets of 

Data in Buffalo, New York: Sales of a Drug Store, Flour 
Milled and Bank Clearings 553 

78. Raw Cotton Exported by the United States 1913-22 555 

79. Investment in Inventory of Swift and Company Packers, 1913-36 557 

80. Moving Averages Fitted to Controlled Data Containing Cycle and 

Straight Line Trend 564 

81. Moving Average Fitted to Controlled Data Containing Cycle and 

Curved Line Trend 565 

82. Number of Horsepower of Diesel Engines Installed Annually, 

1918-37 568 



riv FIGURES 

FIGURE PAGE 

83. Diagram Used to Write the Equation of a Straight Line . . 572 

84. Parabola Trend Fitted to Postal Receipts at Buffalo, New York, 

1920-33 579 

85. Logarithmic Trend Fitted to Production of Wood Pulp, 1923-37 . 580 

86. Straight Line Trend Fitted to the Number of Lines of Magazine 

Advertising, 1913-37 584 

87. Relative Cycles of Magazine Advertising, 1913-37 585 

88. Straight Line Trend Fitted to Electric Power Production, 1919-29 588 

89. Daily Average Consumption of Small Cigarettes, Monthly, 1927-36 595 

90. Approximate Method: Test for Seasonal Pattern of Relatives of 

Annual Averages 598 

91. Moving Average Method: Test for Seasonal Pattern of Relatives 

of Moving Averages 602 

92. Link-Relative Method: Test for Seasonal Pattern of Relatives of 

Preceding Month 605 

93. Ratio-to-Trend Method: Test for Seasonal Pattern of Relatives 

of Trend 610 

94. Seasonal Patterns of Cigarette Consumption According to Four 

Methods 613 

95. Relative Cycles of Cigarette Consumption, Monthly, 1927-36 . . 619 

96. Successive Steps in the Analysis of a Time Series 637 

97. Relative Cycles of Industrial Stock Prices and Commercial Paper 

Rates, Monthly Data, 1919-37 655 

98. Annalist Index of Business Activity 668 

99. Babson Chart of Business Conditions 673 

100. Buffalo Index of Business Activity 676 

101. Eastman Kodak Co. Planning Chart for Product "S" .... 693 

102. Eastman Kodak Co. Planning Chart for Product "C" .... 691 

103. Three Scattergram Patterns 706 

104. Freehand Regression Line Showing Relation Between Prices and 

Earnings per share of Common Stock of Twelve Chemical 

Manufacturers 707 

105A-B-C Regression Line Fitted by Least Squares Method to Prices 
and Earnings per share of Common Stock of Twelve Chemical 
Manufacturers 709-12-13 

106. Standard Error of Estimate and Standard Deviation of y, Prices 

in Relation to Earnings Per Share of Common Stock of Twelve 
Chemical Manufacturers 718 

107. Rates Charged by Banks for Customer Loans in Eight Northern and 

Eastern Cities Exclusive of New York City, and the Yield of 
Aaa Bonds, with Straight Line Trend for Each Series, 1919-37 737 



FIGURES xv 

FIGURE PAGE 

108. Curves of Three Binomial Expansions Compared with Normal 

Curve 757 

109. Normal Curve Plotted by Calculating Values of Ordinates . . . 764 

110. Binomial and Normal Distributions Fitted to Frequency Distribu- 

tion of Monthly Cost of Electric Current 768 

111 A. Binomial Frequency Distributions, N (f-f-^) 10 , for Various Values 

of q and p when N = 100 772 

11 IB. Binomial Frequency Distributions, N ( <?+/>) n f r Various Values 

of n when q = .9, p = .1, and N = 100 773 

112. Diagram for Finding Values of P Associated with Computed Values 

of v 2 and N m 778 

/v 

113. Diagram for Finding Values of 2P Associated with Computed 

Values of / and N m 807 

114. Diagram for Finding the Value of z Associated with 2P = .05 

for a Given N! m l and N 2 m 2 812 

115. Diagram for Finding the Value of z Associated with 2Pz=.01 

for a Given N! m l and N 2 m^ 813 



CHAPTER I 
STATISTICS IN BUSINESS 

THE STATISTICAL APPROACH 

WHEN MASSES of numerical information are to be analyzed 
some means of summarization must be found which will 
focus attention upon their major characteristics. Statistical 
methods have been developed to meet this need; hence in a broad 
sense the statistical approach is essentially a process of classification, 
subclassification, and cross-classification designed to give meaning to 
a mass of information by separating it into comparable parts. Statistical 
methods therefore are useful in any field of knowledge in which the 
recording of events produces masses of numerical information. The 
more important fields are psychology, sociology, education, medicine, 
biology, public affairs, economics, and business. 

Statistical Data Distinguished from Abstract Numbers 

Not all numbers are statistics. A table of logarithms is not a statisti- 
cal table, but simply a compilation of abstract numbers obeying a fixed 
law. On the other hand statistical data are concrete numbers represent- 
ing objects or measurements grouped according to stated characteristics. 
For example in Table 1 sales of low-priced automobiles are classified 
by make of car and by year of production. This double classification 
permits comparison of sales of the three makes in any year and the 

TABLE 1 

SALES OF PASSENGER AUTOMOBILES DURING THE MODEL YEAR 1937-39: 
THREE MAKES IN THE LOW-PRICED FIELD* 



MAKE OF 




MODEL YEA* 




AUTOMOBILE 


1937 


1938 


1939 


Chevrolet 


804,350 


465,403 


577,986 


Ford 


807 258 


345,244 


456,792 


Plymouth 


500,503 


268,436 


387,452 










Total 


2,112,111 


1,079,083 


1,422,230 











GompUed from Tht Annalist, Vol. 49, No. 1258, p. 350; Vol. 51, No. 1310 p. 304; 
Vol. 53, No. 1362, p. 305; Vol. 55, No. 1418, p. 433. 



2 BUSINESS STATISTICS 

comparison of sales of each with the total. The changes in indi- 
vidual and total sales from year to year can also be read from the 
table. 

Statistics deals with numbers not merely as such, but as the expres- 
sion of a quantitative or qualitative relationship of the concepts with 
which they are associated. Statistical work is for the most part a mat- 
ter of expressing these relationships in the best form, and of finding 
new relationships. Thus the comparisons observed in Table 1 might 
be facilitated by the computation of per cent distributions and index 
numbers. The development of such techniques of analysis forms a 
major part of the content of subsequent chapters. 

Statistics in the Field of Business 

While the statistical procedures useful in the several fields of knowl- 
edge are in the main identical, those procedures must be adapted to 
the particular types of information found in each field. The use of 
ratios is a basic method of analysis common in all types of statistical 
work but the emphasis on different kinds of comparison varies mark- 
edly from one field to another. In vital statistics the study of death 
rates leads to the development of crude rates, specific rates, standardized 
rates, and corrected rates. On the other hand business data require 
per cent relations, per cents of change, per capita ratios, per cent dis- 
tributions, and index numbers. Whether used in vital statistics or 
business statistics the word "ratio" implies a relation between two items 
one of which is the numerator and the other the denominator. But 
the examples cited show the variation in usage and suggest the extent 
to which subject matter determines what type of ratio comparisons will 
be emphasized. 

The relation of emphasis to subject matter can be illustrated further 
by considering time-series analysis. The business statistician spends a 
major fraction of his time in separating time series into their several 
components primarily to segregate the cyclical fluctuations. In such 
fields as medicine, psychology, education, and biometry the techniques 
of time-series analysis are relatively unimportant and when used seldom 
have as a goal the study of cyclical fluctuations. In this illustration, as 
in the preceding one, subject matter and purpose determine to a large 
extent the form of use and the importance of a particular method of 
analysis. 



STATISTICS IN BUSINESS 3 

These examples should be sufficient to indicate that a development 
of statistical methods applicable to a particular field of knowledge 
becomes more specific than a general presentation, and therefore is 
better suited to the needs of those interested in that field. Consequently 
this book is devoted in the main to a presentation of statistical methods 
and operating techniques suitable for the analysis of masses of numeri- 
cal information arising in the field of business. 

The word ' 'business" is taken to include the aggregate of activities 
involved in transforming raw materials into finished consumable 
products and transferring goods at all stages of the process. The usual 
divisions of the field are production, marketing, financial operations, 
and transportation and communication. Such activities as legal advis- 
ing, accounting, and technical research including statistical work have 
not been listed as divisions of business because they are adjunct to all 
phases of business operation. Statistics in particular may assist in the 
.solution of problems arising in any part of the business field but has 
its greatest usefulness when large masses of numerical information 
are to be analyzed. 

The following illustrations of the uses of statistical methods in the 
four main divisions of business enterprise should give sufficient evi- 
dence of the pervasiveness of the statistician's work. 

Production 

Preparation of production schedules 

Determination of distribution of sizes in manufacture of flats, shoes, suits, 

dresses, etc. 

Analysis of time and motion studies 
Cost analyses 

Marketing 

Determination of sales areas and sales quotas 
Study of effectiveness of advertising 
Relation of size of orders to net profits 
Relation of mark-downs of goods to buying policies 

Financial operations 

Ratio analysis of financial statements by banks to determine the credit risk 

of prospective borrowers 
Determination of the average discount rate of customer loans of a bank 

Transportation and communication 

Ratio analysis of railroad traffic to determine operating efficiency, operating 

density, etc. 

Study of relative costs of moving freight by truck and by rail 
Study of telephone and telegraph message density 



4 BUSINESS STATISTICS 

THE WORK OF THE STATISTICIAN 

Types of Statistical Work 

The type of problem with which a statistician deals is determined 
by his location in the economic structure. If he is employed by a busi- 
ness concern his work consists mainly in the analysis of problems arising 
within the concern and his data are usually the records of the concern's 
operations. The extent and character of statistical work carried on 
within any individual concern depend upon the type of business and 
the funds available. There are many firms that do not maintain sepa- 
rate statistical departments, but which conduct statistical analysis as 
an adjunct to the main function of one or several departments. Con- 
siderable information concerning the variation in statistical practice 
in different concerns can be obtained from a survey made by the 
National Industrial Conference Board. 

The Conference Board survey, which was begun in June, 1939, reveals that 
no uniform practice is followed in the organization of research. Only 30% of 
the companies maintain a separate centralized department for such work, or 
place the responsibility in the hands of a single statistician or economist. Ten 
per cent of the concerns assigned such research to a single executive. The 
majority, about 60%, divided the task among several executive offices and 
departments. 

Somewhat greater centralization of research is found in financial and public 
service companies than in manufacturing concerns. 

The greatest volume of work appears to be done in the accounting and 
controllers' departments, and the second heaviest volume falls on the executive 
offices. Next in importance come the sales division, the production department 
and, in fifth rank, the centralized statistical or economic research department. 

Most companies compile data for internal use on sales and orders, pro- 
duction, employment and purchases. Less than one-fourth of the companies 
reporting, however, attempt to compile data on inventories in the hands of 
distributors of their products. Most companies also attempt to forecast future 
trends in sales, production, costs, inventory requirements, prices and profits. 
More than half of the organizations reported that they attempt to forecast sales 
by geographic regions. 

Forty-two per cent of the companies carrying on research compile periodic 
reports on the outlook for the particular industry in which they are engaged, 
and nearly as many compile data on the prospects for business in general. 
About 15% also prepare studies on general business conditions as they affect 
purchasers of their finished products and suppliers of their raw materials. 

Other special studies carried on to a considerable extent by private industry, 
listed in the order of their importance, are the economic effects of taxation, 



STATISTICS IN BUSINESS 3 

analysis of departmental operations, personnel practices, plant layout, effects of 
legislation, and feasibility of plant expansion. 1 

While this study shows the ramifications of statistical work in business 
it does not emphasize the variety of problems encountered by an indi- 
vidual statistician. He may be asked to make cost analyses and fore- 
casts of production for the manufacturing department, analyses of time 
and motion studies for the plant scheduling department, studies of 
employment, payroll, and wages for the personnel department, sales 
quotas for the sales department, estimates of plant burden for the main- 
tenance department, studies of bad debt losses for the credit depart- 
ment, or an investigation of the relation between selling prices, sales 
volume, and turnover of inventory for the president. 

The variety of subjects included in this list demonstrates the scope 
of the work of the statistician employed by an individual concern. He 
must have considerable familiarity with the operations carried on in all 
departments of the concern and an understanding of the economic prin- 
ciples involved in those operations, in addition to a working knowledge 
of statistical methods. 

Problems of a different type are dealt with by a statistician engaged 
in independent research or employed by a trade association, a commer- 
cial research agency, a government bureau, or a university research de- 
partment. Much of the information used in this kind of statistical 
analysis is gathered from the records of individual concerns or agencies. 
The data are therefore of essentially the same nature as those used by 
each individual concern in analyzing its own problems, although the 
purpose of the analysis is different. 

Table 2 is an illustration of a study that makes use of the records 
of a number of concerns. The Bureau of Business Research of the 
Harvard Graduate School of Business Administration maintains a 
regular reporting service through which it receives annual reports of 
operations from a large number of department stores in all sections of 
the country. This table gives a summary of the turnover rates computed 
from the reports of 430 stores. The stores are divided into ten groups 
according to size as measured by annual sales. The purpose of making 
this classification is to group together stores operating under conditions 
that are as nearly similar as possible. 

* Commercial and Financial Chronicle, Vol. 150, No. 3894 (February 10, 1940), 
p. 904 (New York: William B. Dana Co.), reproduced from a National Industrial Con- 
ference Board Report. 



BUSINESS STATISTICS 



TABLE 2 

TURNOVER OF GOODS IN DEPARTMENT STORES OF DIFFERENT SIZES IN THE 
UNITED STATES, 1938 * 



Sizi OF STORK 
As MEASURED BY 
ANNUAL SALES 


NUMBER OF 
STORKS 
REPORTING 


AVERAGE 
TURNOVER OF 
GOODS 


Less than $150,000 


54 


2.1 


150,000- 300,000 


45 


2.7 


300,000- 500,000 


58 


3.6 


500,000- 750,000 


35 


3.6 


750,000- 1,000,000 


28 


4.2 


1,000,000- 2,000,000 


62 


4.2 


2,000,000- 4,000,000 


58 


4.4 


4,000,000-10,000,000 


57 


4.7 


10,000,000-20,000000 


20 


4.7 


20,000,000 or more 


13 


3.4 









* Malcolm P. McNair, "Operating Results of Department and Specialty Stores in 
1938," Bureau oj Business Research Bulletin Number 109 (May, 1939), Boston: Graduate 
School of Business Administration, Harvard University. 

The turnover is computed by dividing the annual sales by the aver- 
age inventory. The increase in the turnover, as size of store increases, 
indicates that the smaller stores maintain larger inventories in relation 
to sales than the larger stores. This skeleton fact gives rise to many 
questions related to the analysis of department-store operations. For 
example, one might theorize as follows: the smaller stores must keep 
in stock a line of goods practically as inclusive as that maintained by 
larger stores; however, in smaller stores, demand for many types of 
goods is only occasional, whereas those same goods are in constant 
demand in larger stores; consequently the maintenance of this slow- 
moving stock reduces the turnover of the smaller stores. The testing 
of this hypothesis would be a task for the statistical staff that has 
access to the reports of the individual concerns. 

Further examples of the type of research undertaken by statisticians 
working with the records of individual concerns are: the relation of 
advertising costs to sales; the seasonal variation in automobile sales 
in different parts of the country; the relation of bank loans to size 
of banks and population of cities in which the banks are located; and 
the rates of interest charged for installment credit according to type 
of goods purchased. In other cases the data used in research work do 
not come from business concerns but from markets, individuals, or the 
results of prior statistical work. Some illustrations are the construction 
of an index of the general price level, a study of the preferences of 
consumers for competing products, and an analysis of the relation 



STATISTICS IN BUSINESS 7 

of the alternations of prosperity and depression to the sales of con- 
sumers' goods and producers' goods. 

The studies mentioned in preceding paragraphs give some indication 
of the difference in type of problem encountered by statisticians work- 
ing for individual concerns and by those who engage in business re- 
search in some other capacity. Specific techniques that are important 
in one type of work may be less so in the other, but a common body 
of method is used in either case. Since the primary purpose of this 
book is to present a systematic development of statistical method, no 
occasion will arise for keeping the two types of statistical problems 
separate. Examples will be drawn from either to illustrate the discus- 
sion of methods employed in both. 

Statistical Background of Business Activity 

The extent to which business operations depend upon a background 
provided by statisticians is not generally realized. This statement is 
equally applicable to every technical specialist who is a part of the 
business structure, but the obscurity of the statistician's contribution is 
particularly striking because of the wide ramifications of his work. 

The nature of the work of the statistician can be explained with the 
aid of some published statements concerning business affairs. 

Example 1. "Wages [in 1938]" Swtft and Company Year Book 
(1938), p. 26: 

Since 1923 the average hourly wage rate for Swift & Company's Chicago 
plant workers has increased by 52 per cent, while the number of hours in the 
basic working week has been reduced from 48 to 40. Actual weekly earnings 
per worker are about 37 per cent greater than in 1923. Taking into account 
the changes in living costs, these weekly earnings provide Swift & Company 
plant workers with approximately 57 per cent higher "real" wages than they 
received in 1923. 

The statistical department of Swift and Company presumably main- 
tains employment records containing average hourly wages, the number 
of hours worked per week, and the average weekly earnings per worker 
in 1923 and in 1938. The computation of the per cents of increase is, 
of course, routine work. Indexes of the cost of living are published 
by the United States Bureau of Labor Statistics and the National Indus- 
trial Conference Board. A comparison of weekly wages of Swift and 
Company employees with a cost of living index gives the increase in 
real wages. 



8 BUSINESS STATISTICS 

Example 2. "Your Food Supplies and Costs," Consumers? Guide, 
Vol. V, No. 10 (October 24, 1938), p. 16: 

EGGS. Supplies are expected to continue smaller than a year earlier during 
the remainder of 1938, but in 1939 supplies probably will be bigger than in 
the current year. Relatively small stocks of storage eggs, coupled with smaller 
fresh egg production than in 1937 have been the major factors behind the 
larger than usual price upswing this year. Storage stocks are an important 
source of supply during the last quarter of the year when fresh egg production 
reaches its lowest level. Current storage stocks are almost a third under a 
year ago. 

Continuation of the present rate of increase in prices would result in peak 
egg prices in November considerably above their 1937 level and might result 
in the highest prices since 1930. There is some possibility, however, that fresh 
egg production may comprise a larger than usual proportion of total supplies 
during the last two months of the year because of the large hatchings this 
spring. This condition would offset part of the price boosting effect of small 
storage stocks. Retail egg prices went up 5 cents a dozen from August to 
September and were a cent a dozen higher than last September. 

The statistical background for this analysis of egg prices has been 
supplied by the United States Department of Agriculture. Local offices 
of the department in all parts of the country send regular reports to 
Washington concerning conditions in their areas. The analysis of these 
reports by the statistical division provides information concerning the 
supply of eggs for the latter part of 1938 and early 1939, the size of 
cold storage holdings, the prices of eggs during the year, and the 
prospective supply of egg-laying pullets. Previous studies of the depart- 
ment afford a basis for the statement that "storage stocks are an impor- 
tant source of supply during the last quarter of the year when fresh 
egg production reaches its lowest level." Comparisons of current re- 
ports with department records show absolute and relative price changes 
from earlier months as well as earlier years. 

Example 3. Buffalo Evening News (November 22, 1938), p. 29: 

YULE SALES MAY EQUAL $1,200,000,000 OF 1937 

NEW YORK, Nov. 22 (AP). A busy Christmas shopping season was 
foreseen today by the National Retail Dry Goods Association. 

An analysis by its accounting experts, the association reported, indicated 
dollar sales in department and apparel specialty stores of the nation in the 
four weeks preceding Christmas may approximate $1,200,000,000, about the 
same as in the comparable 1937 period. 



STATISTICS IN BUSINESS 9 

Actually, the number of items traded across store counters, it was pointed 
out, may exceed last year's Christmas trade because department store prices this 
year are about 7 per cent lower on the average. 

As stated in the article the accountants have estimated that sales 
during the Christmas season of 1938 will be very satisfactory, but the 
main point is the work of the statistician which is back of the innocent- 
looking statement that prices of department store goods are about 7 per 
cent lower than they were in 1937. This conclusion is probably based 
on the "Index of Prices of Department Store Goods" prepared 
monthly by A. W. Zelomek and published in Fairchild Publications. 
This index is based on prices of 105 nonstyle items collected monthly 
from 53 retail-trade organizations. 

Example 4. "The Trend of Business," Dun's Review, Vol. 46, 
No. 2121 (May, 1938), pp. 30-31: 

On the charts, the present state of business activity bears some resemblance 
to that of 1934. A few of the more important measures national income, 
department store sales, wholesale prices, construction contracts are still above 
early 1936 levels, considerably higher than in 1934. On the other hand, indus- 
trial production is down to the 1934 average; primary distribution, measured 
by railroad carloadings, is the lowest since November, 1934; the Annalist index 
of business activity for March, the lowest since November, 1934; the Times 
average of 50 stock prices for the first three weeks of April, the lowest since 
September, 1934. 

The first sentence indicates that charts have been prepared by statisti- 
cians showing the course followed by various indicators of business 
conditions in recent years. The computation of the national income 
requires the continuous attention of a corps of statisticians in the 
United States Department of Commerce. Department-store sales are 
reported by over 400 individual stores to the Federal Reserve Banks 
of die districts in which the stores are located. Indexes of sales are 
prepared for each district as well as for the United States as a whole. 
Wholesale price indexes are prepared by a number of statistical agen- 
cies, but the most widely used index is that of the United States 
Bureau of Labor Statistics computed by an elaborate technique and 
based on prices of over 800 commodities. Data on construction con- 
tracts are collected by the F. W. Dodge Corporation through local 
offices and correspondents in 37 states east of the Rocky Mountains. 
An Index of Industrial Production is published by the Board of Gov- 
ernors of the Federal Reserve System. The research staff prepares the 



10 



BUSINESS STATISTICS 



index by the application of elaborate statistical techniques to data com- 
piled from trade journals, reports of trade associations and government 
bureaus. Railroad car loadings are collected from individual railroad 
companies and prepared for publication by the Car Service Division of 
the Association of American Railroads. The Annalist index of business 
activity is a cyclical index corrected for trend and seasonal variation by 
an involved statistical process. The New York Times average of 50 
stocks is a product of the newspaper's research staff. 

Articles similar to these are presented every day to the reading pub- 
lic and they exert a widespread influence over the conduct of business 
affairs. These four examples give some indication of the variety of 
the activities of business statisticians and of the multiplicity of methods 
and techniques they employ. The orderly development of basic meth- 
ods and techniques and their relation to various business activities 
become the subject matter of a textbook in statistics. 

PROBLEMS 

1. What distinguishes statistical data from abstract numbers? 

2. Apply this distinction to the following; give reasons for your answer in 
each case: 

A 



NUMBERS 



SQUARES 



SQUARE 
ROOTS 



RECIPROCALS 



51 2601 7.1414 .019608 

52 2704 7.2111 .019231 

53 2809 7.2801 .018868 

54 2916 7.3485 .018519 

55 3025 7.4162 .018182 

B 

REPORT OF OvBRfiMB WORKED BY LOCAL BRANCHES OF A LABOR UNION 
AMOUNT OF OVERTIME No. OF LOCALS 

None 2 

Occasionally 3 

Never more than 6 hours per week 1 

When necessary 3 

Five hours regularly _2^ 

Total 13 

C 

ADDITIONS TO TERRITORY OF CONTINENTAL UNITED STATES AFTER 1783 

TERRITORY DATE OF ADDITION 

Northwest Territory 1787 

Louisiana Purchase 1803 

Florida 1819 

Texas 1843 

Oregon 1846 

Mexican Cession 1848 

Gadsden Purchase 1833 



STATISTICS IN BUSINESS 11 



HOURLY UNSKILLED HIRING WAGE RATE OF A GROUP OF MANUFACTURING CONCERNS 

IN 1936 

CONCERN HOURLY WAGI RATE 

(in Cents) 

A 32 

B 36 

C 30 

D 40 

E 35 

3. What are the differences between the study of general statistics and busi- 
ness statistics? 

4. Why is statistics not listed as a division of business activity? 

5. List the differences in function of the statistician employed by a private 
concern and one employed by some other type of organization. 

6. In the preparation of which of the following reports would the statistician 
initially compiling the data be employed by an industrial concern? 

a) Monthly production of automobiles and trucks by General Motors. 

b) Weekly freight car loadings of coal in the United States. 

c) Daily bank clearings of the Buffalo Clearing House Association. 

d) The monthly production of crude oil in the United States. 

e) The net profits of the Erie Railroad for the first six months of 1930. 
/) The daily messages carried by the New York Telephone Company. 

g ) The number of airplanes arriving and departing at the Buffalo airport. 
h) Number of bank employees: 



NAME OF BANK 


EXECUTIVES 


CLERKS 


Export 


23 


180 


First National> 


14 


200 


etc 












Total 


212 


1325 



7. Describe the statistical material found on the financial pages of an urban 
newspaper. Be sure to give exact reference to the issue and edition of 
the paper. 

8. Select from a current publication an article similar to the examples in the 
text. State what work has been done by statisticians in the preparation of 
the article. Give exact reference. 

REFERENCES 

BROWN, THEODORE H., "Problems Met by Companies That Instruct Their 

Employees in Statistical Methodology," Journal of the American Statistical 

Association, Vol. XXVII, No. 181 A (March, 1933, Supplement), pp. 10-14. 

A suggested program of statistical work that could be carried on within a 

business concern. 



12 BUSINESS STATISTICS 

BURGESS, ROBERT W., "The Whole Duty of the Statistical Forecaster," Journal 
of the American Statistical Association, Vol. XXVII, No. 181A (March, 
1933, Supplement), pp. 636-42. 

The first part of this article includes some excellent examples of the types 
of analysis made by statisticians. 

FALKNER, ROLAND P., "The Scope of Business Statistics," Quarterly Publica- 
tions of the American Statistical Association, Vol. XVI, No. 122 (June, 
1918), pp. 24-29. 
An earlier attempt to define business statistics that is still pertinent. 

HATHAWAY, WILLIAM A., "Internal and External Statistical Needs of American 
Business," Quarterly Publications of the American Statistical Association, 
Vol. XVI, No. 122 (June, 1918), pp. 1-15. 

Contains background essential to an understanding of modern statistical 
development 

PARMALEE, JULIUS H., "The Utilization of Statistics in Business," Quarterly 
Publications of the American Statistical Association, Vol. XV, No. 117 
(June, 1917), pp. 565-76. 

An early statement of the types of statistical analysis available to business 
men. 

YOUNG, BENJAMIN A., Statistics as Applied in Business. New York: The 
Ronald Press Company, 1925. 

Chapters I to VIII contain a very complete statement of the character of 
internal and external statistical work and a detailed presentation of types of 
statistical problems. 



CHAPTER II 
THE USE OF NUMBERS 

INTRODUCTION 

IN BUSINESS practice there is an increasing trend toward the 
expression of ideas in numerical form. The manager of a store 
no longer reports that business is improving, but that sales last 
month were 16 per cent better than in the corresponding month of 
last year. The banker no longer relies solely upon his personal judg- 
ment in granting loans, but uses a set of ratios, derived from the finan- 
cial statement of a prospective borrower, to aid him in determining 
the concern's credit standing. A similar tendency toward more precise 
methods can be found in all parts of the business structure. In no small 
degree this tendency accounts for the increasing demand for a knowl- 
edge of statistical methods. 

On the other hand teachers in various parts of the country have 
remarked that young men and women of college age show a decline 
in ability to carry out numerical operations and particularly an increas- 
ing inability to think in numerical terms. It is not the function of a 
textbook in statistics to reverse this tendency. The fault is too funda- 
mental for that. This condition does explain, however, why it is desir- 
able to pause for a brief statement concerning methods of computation 
before proceeding with the development of statistical techniques. 

The necessary computation which accompanies statistical work 
consumes a vast amount of time. The greater part of such computation 
is purely repetitive in character; consequently methods of shortening 
the time spent in doing it will allow more of the student's time to be 
spent in studying statistics and less in practicing arithmetic. Hence 
the following pages are devoted to a review of arithmetic operations. 

THE FUNDAMENTAL OPERATIONS 

Addition 

Computations should be performed rapidly. The advantage in 
speed lies not merely in the time saved, but mainly in the confidence 

13 



14 BUSINESS STATISTICS 

gained for those who waver and in the attention preserved for those 
whose minds might wander. There is some advantage here in illustrat- 
ing the wrong method. Suppose that the following columns are to 
be added: 

2641 
362 
570 

1369 

8147 

4216 

2164 

All too commonly in the author's experience the student's mind goes 
through the following steps: adding up from the bottom, 4 and 6 are 
10, 10 and 7 are 17, 17 and 9 are (then 9 are told off on the fingers) 
are 26, 26 and 2 are (I think I'll go to lunch after this class). Let me 
see, I was adding something. Oh yes, 4 and 6 are 10, etc. 

To eliminate this wandering the addition should be done at the 
maximum speed possible, naming only the successive sums. So, first 
column, 10, 17, 26, 29; second column, 9, 19, 26, 36; third column, 
6, 10, 15, 24; fourth column, 8, 16, 19. There is everything to be gained 
and nothing to be lost by performing addition at a rate of speed which 
will leave no time to worry about an impending lunch hour. There are 
students who can be busily engaged for as much as 2 minutes in adding 
these columns of figures, whereas 15 seconds is the maximum time 
which should be spent. 

Those who have difficulty with the amount to be carried from one 
column to the next may prefer to write the total of each column 
separately as indicated below. 




This method is advantageous if one is likely to be interrupted. 

Subtraction 

The results of subtraction should always be checked by adding the 
subtrahend and the remainder. 



THE USE OF NUMBERS 15 

38642 minuend 
() 12966 subtrahend 

25676 remainder 
(+) 12966 subtrahend 

38642 check 

Subtraction of numbers can be performed mentally by using a 
method of excess and deficit in relation to a round number such as 100. 
In subtracting 69 from 118, the minuend is 100+ 18, the subtrahend 
is 100 31; therefore the difference is 18 + 31 = 49. Other similar 
examples are: 

263 = 200 + 63 547 = 500 + 47 

186 = 200 14 490 = 500 10 

77 remainder 57 remainder 

1245 = 1000 -f- 245 2317 = 2200 + 117 

893 = 1000 107 2049 = 2200 151 

352 remainder 268 remainder 

After this method has been mastered, it will not be necessary to put 
any of the computation on paper. With sufficient practice the method 
can be used in fairly complicated operations. For example, 

12286 = 12000 -f 286) , . 4260 = 4000 + 260 

- 1143= 1000 + 143 } subtractm g -2749 = 3000 251 

11000 -f- 143 = 11143 1000 + 511 = 1511 

Multiplication 

The greatest saving of time in multiplication results from the use 
of short cuts. These are derived from well-known principles of arith- 
metic and algebra as indicated in the examples of various methods 
which follow. 

The Use of Reciprocals. 1 One number multiplied by another is the 
same as the first number divided by the reciprocal of the second. 

1. 763X5 =If. = ^ = 3815 

2. 1582 X 25 = ^ 2 ^ _ 39550 
5. 220X50 = 



4. 17228 X 125 = 1722 8 8000 = 2153500 

5. 15415 X .16J = i5p = 2569.17 

1 The reciprocal of a number is defined as unity divided by the number, i.e., the 
reciprocal of 5 is 1 -5- 5 = .2. The reciprocal of 40 is 1 -f- 40 = .025. The reciprocal 



of .25 is 1 -h .25 = 4. 



16 BUSINESS STATISTICS 

Multiplier Near Ten or a Power of Ten. 

6. 27 X 99 = 27(100-1)= 27(100) 27(1)= 2700 27 = 2673 

7. 366 X 1001 = 366000 + 366 = 366366 

8. 2746 X 11 = 27460 + 2746 = 30206 

Squaring Numbers Ending in Five. 2 

9. 25* =(2X3) 100 + 25 = 625, i.e., 2(2 + 1) and annex 25 

10. 75 2 = (7 X 8) 100 + 25 = 5625, i.e., 7 (7 -f 1 ) and annex 25 

11. 105 2 = (10X 11)100 + 25 = 11025 i.e., 10(10+ 1) and annex 25 

12. 405 2 =(40 X 41) 100 + 25 = 164025, i.e., 40(40 + 1) and annex 25 

Last Digits Totaling Ten? 

13. 67 X 73 = (70 3) (70 + 3) = 4900 9 = 4891 

14. 95 X 105 = (100 5) (100 + 5) = 10000 25 = 9975 

15. 89 X HI = (10 11) (100 + 11) = 10000 121 = 9879 

16. 4.1 X 2.9 = (3.5 + .6) (3.5 .6) = 12.25 .36 = 11.89 

17. 640X5.6 =100 (6.4X5.6) = 100 (6 + .4) (6 .4) = 100(36 .16) 
= 100 X 35.84 = 3584 

A Method of Squaring Any Number. 4 

18. 72* = (72 2) (72 + 2) + 2* = (70 X 74) + 4 = 5180 + 4 = 5184 

19. 153* = (153 3) (153 + 3) + 3 2 = (150 X 156) + 9 = (100X156) 
+ (50 X 156) + 9 = 15600 + 7800 + 9 = 23409 

Division , 

There are a few worthwhile short cuts in division, based mainly on 
the use of reciprocals. 5 Commonly used among these are, 

20. 5725 -f- 25 = 5725 X -04 = 57.25 X 4 = 229 

21. 280400 -f- 50 = 2804 X 2 = 5608 

22. 12750 -f- 500 = 12.75 X 2 = 25.5 

23. 245925 -f- 125 = 245.925 X 8 = 1967.4 

It is to be expected that the reader who employs the few short cuts 
listed here will develop many more to aid him as he progresses in the 

* To square any number ending in 5 multiply the part of the number to the left 
of 5 by one more than itself and annex 25 to the product. 

8 Examples 13 to 17 make use of the algebraic identity (a -f b) (a )= * 2 b*. 

4 The same algebraic identity is used as in the preceding examples but the form is 
changed to a * = (a -f- b) (a b) -f b*. 

5 The use of reciprocals changes division to multiplication whereas the use of 
reciprocals on a preceding page changed multiplication to division. 



THE USE OF NUMBERS 17 

use of numbers. There is no standard set which can be recommended 
for the use of everyone. Each person should employ those which result 
in time saving for him and come to mind naturally. Just as every 
person has an individual style in writing so every person will develop 
an individual set of short cuts in computation. 

The Order of Performing the Fundamental Operations 

When the operations of addition and subtraction are employed in 
a problem, the order of performing them makes no difference in the 
result. Thus: 

50 + 275 36 + 5 210 = 84 
or 275 364-50 210 -f 5 = 84 

or 210+ 50 36 + 275+ 5=84 

The introduction of parentheses into such a series indicates that the 
operations within the parentheses must be performed first. There will 
be no difference in the result when the sign preceding a parenthesis 
is plus, but when it is minus, it has the effect of reversing the sign 
of every figure inclosed. Thus: 

69 63 + 58 10 = 54 
69 (63+ 58) 10 =69 121 10 = 62 
69 (63+ 58 10) =69 111 = 42 
but 

69 63 +(58 10)= 69 63 + 48 = 54 

When the operations of multiplication and division are employed 
in a problem, the order of performing the division does alter the result; 
hence in order to avoid ambiguity, it is necessary to inclose in paren- 
theses the figures that are intended to be used together as numerator 
and denominator. Thus: 

(250 -f- 10) X 2 = 25 X 2 = 50 
but 

250 + (10 X 2) = 250 -i- 20 = 12.5 

If several signs of grouping are used in the same problem, the rule 
is "Work from the inside out. Thus: 

-5 [{26 (36 + 9) } -r- 52] = -5 [{26 X 4} -r- 52] = -5 [104 ^ 52] 

= -5X2 = -10 

When multiplication or division or both appear in a problem along 
with addition or subtraction or both, with no parentheses, the opera- 



18 



BUSINESS STATISTICS 



tions of multiplication and division must be performed first. If paren- 
theses are introduced, the rules already quoted will apply. Thus: 

550 + 10 X 7 60 + 5 = 550 -f 70 12 = 608 
(550 + 10) X (7 60) -*. 5 = 560 X (53) -r- 5 == 29680 -4- 5 

= 5936 

(550 0- 10) X [7 - (60 + 5)] = 560 X [7 - 12] = 560 X (-5) 

= - '.800 



FRACTIONS 

Sometimes computations are carried on in common fractions and 
sometimes in decimals. It is desirable therefore to know how to per- 
form operations with both and how to convert one into its equivalent 
in the other. 

Common Fractions 

Addition and Subtraction. To acid J, i, and i, the common denomi- 
nator must first be found. The common denominator is the small- 
est number divisible by the individual denominators, in this case 
2, 3, and 5. By inspection this is 30; there is no smaller number 
divisible by 2, 3, and 5. The three fractions with the common denomi- 
nator 30 are . + + $ = f O r !,&. Suppose that the four fractions 
3if, 5Ai 2Af and 8f were to be added. When, as in this case, the 
common denominator is not evident by inspection, the general method 
of finding it is to reduce all of these individual denominators to their 
prime factors and take the product of the prime factors appearing 
in the reduction. The form for finding the common denominator of 
the given fractions is: 



Divisors 


Denominators 


2 


36 


n 


20 


9 


2 


18 


15 


10 


9 


3 


9 


15 


5 


9 


3 


3 


5 


5 


3 


5 


1 


5 


5 


1 



The process consists in dividing by 2 as long as any of the denominators 
are divisible by 2. Then do the same with 3 and so on using only 



THE USE OF NUMBERS 19 

prime 6 numbers as divisors until unity is reached in each column. 
When any denominator is not divisible in any row, it is simply carried 
along until a divisor is used by which that denominator is divisible. 
The common denominator will be the product of the prime divisors 
on the left, i.e., 2X2X3X3X5 = 180. The four fractions ex- 
pressed in terms of the common denominator are, 3^r + 5-nnr + 



Subtraction of fractions is performed by the same process of reduc- 
ing to a common denominator. For example, in subtracting 6^- from 
13& both fractions should be changed to the common denominator 90. 
Then ijft - 6 = 12-V7- - 6fJ = 6U = 6ft- 

Multiplication. The type of multiplication problem most common 
in statistical work involving common fractions is similar to that met 
by the bookkeeper in extending bills or inventories, e.g., 4f dozen 
shirts at $16J per dozen. Reduce each number to an improper fraction: 
4f = -^ and 16 = *& ; then the total value of the shirts would be 

29 v .3 3 29 _ yx^X 1 1 _ 20X11 __ 3 1 9 -_ C7Q 3 . 
~ff~ A ^ 2 *,&* 2 4 -- i M>'"T 

Division. The rule for finding the value of the quotient of two 
fractions is: invert the fraction in the denominator and multiply. 
Some examples are: 
f-i 

4ft - ii 

2 T 5 TX7f 



Decimal Fractions 

The preceding paragraphs have dealt with problems containing 
common fractions. It is equally necessary to be conversant with meth- 
ods of dealing with decimal fractions. In fact decimals are more 
frequently employed in statistical work than common fractions. The 
use of calculating machines requires the expression of fractional 
amounts decimally. It is necessary therefore that statisticians acquire 
facility in handling both common and decimal fractions and be able 
to convert one to the other automatically. To convert a common frac- 
tion to a decimal the numerator is divided by the denominator, i.e., 

1=1.0-*- 5 =.2 

f = 3.000 -5- 8 = .375 
= 5.000 -^ 140 = .0357 



e A prime number is one that is divisible only by unity and the number itself. 



20 



BUSINESS STATISTICS 



Decimal Fractions and Per Cents. When any number is expressed 
as a decimal or a per cent, it simply means that the numerator of a 
common fraction is written, the denominator being understood without 
writing it. Decimals mean a certain part of one unit, while per cents 
mean a certain part of 100 units. Thus, .5 means five-tenths of one 
unit, or one-half, while 50 per cent means 50 of 100 units. Obviously 
i of every one and 50 of every hundred are merely two ways of 
expressing the same relation; hence we say that .5 is equivalent to 50 
per cent. The rule is: To express a decimal as a per cent move the deci- 
mal point two places to the right. The reverse rule is: To express a 
per cent as a decimal move the decimal point two places to the left. 

It is important for the statistician to be able to change from com- 
mon fractions or decimals to per cents and the reverse with accuracy 
and without consuming much time in the process. Table 3 gives a 
list of equivalents that can be referred to until their use becomes 
automatic. 

TABLE 3 
LIST OF COMMON FRACTIONS WITH THFIR DECIMAL AND PER CENT EQUIVALENTS 



COMMON 
FRACTION 


EOUIVAI ENT 
DlXIMAL 


EOUIYALKNT 
PER CI-M 


COMMON 
FR \CJION 


Eonv\LENr 
DECIMAL 


EQUIVALENT 
PER CENT 


TWIT . . 


.001 


.1 


js. 


.625 


62.5 


ir&iT . 

TOtf . 


.002 
.0025 


.2 
.25 


7_ 
* 


.875 
.166- 


87.5 
16.66- 


3&S 


.00333 


.33-- 


S 
6 


.833- 


83.33- 


Tffas 


.004 


.4 


i . . 


.2 


20. 


a ^ Q 


.005 


.5 


I 


.4 


40. 


T*TT 


.00625 


.625 


I 


.6 


60. 


ToU" 


.0075 


.75 


* 


.8 


80. 


TW 


.01 


1. 


i . 


.25 


25. 


irihr 


.Olfr 


1 5 


3 

4 


.75 


75. 


-ks .... 


.02 


2. 


a .... 


.33- 


33. 33-- 


?V 


.025 


2 5 


\ . .. 


.66 


66.66 


T&tf 


.03 


3. 


* . . 


.5 


50. 


A- 


.0333 


3.33- 


^ . . 


1 5 


150. 


& 


.04 


4. 


* . . 


1 33- 


133.33- 


TiV .... 


.05 


5. 


-i ... 


1 25 


125. 


TV 


.0625 


6.25 


if . . 


1.75 


175. 


TV 


.066 


6.66- 


2 


2 2 


220. 


TV 


.0833 


8 33-- 


31 


3 625 


362.5 


T7 .... 


.4166 


41.66- 


4^ ... 


4.875 


487.5 


T V 


.5833- 


58.33-" 


5e- ... 


5 833- 


583-33- 


H 


.9166 


91.66- 


8rV 


8 1 


810. 


i 


.125 


12.5 


lOTff . 


10 3125 


1031.25 





.375 


37.5 


12/TT 


12 35 


1235. 















THE USE OF NUMBERS 21 

Calculation of Per Cents. The three terms of a per cent calculation 
are: (1) the base, b, (2) the rate, r, and (3) the percentage, p. The 
fundamental relation is b X r = p. Given any two of these terms 
the third one can be found from the fundamental relation. There are, 
therefore, three types of problems which arise. Each of these is illus- 
trated in the examples which follow: 



Example 1 : How much is 5 per cent of 12420? 

This means that 5 of every hundred in 124.20 hundreds are to be 
counted, so 5 X 124.20 = 621, hence 5 per cent of 12420 is 621. 

The simpler way of doing the same thing is to multiply the given 
number, 12420 by .05 = 621. That is, instead of taking 5 out of every 
hundred in the original number, take .05 out of every one in the orig- 
inal number. 

Example 2: How much is 364 per cent of 1250? 

1250 X 3.64 = 4550. 
Example 3: How much is 750 increased by 40 per cent of itself? 

750 -f (750 X -4) = 750 + 300 = 1050, or 750 X 1-4 = 1050. 
Example 4: How much is f of 4875? f per cent of 4875? 
4875 X -4 = 1950. 4875 X -00 4 = 19-50. 

p-r-r = b 

Example 5: 450 is 75 per cent of what number? 

If 75 per cent of a certain number is 450, then 1 per cent of the 

number is- of 450 or 6. If 1 per cent of a number is 6, then 

100 per cent of the number is 100 times 6, or 600. Therefore the 

number is 600. The work can be shortened as follows: 450-r-.75=600. 

Example 6: 375 is 12 J per cent of what number? 
375 -f- .125 = 3000. 

Another solution would use the 12 per cent as 4. The problem 
would then read, 375 is \ of what number? The number is 375 X 8 = 
3000. 

Example 7: 12500 is f of what number? 

If 12500 is | of the number, then i of the number would be i of 
12500 or 3125. If i of the number is 3125, then f or 100 p*r cent 



22 BUSINESS STATISTICS 

of the number would be 3 times 3125, or 9375. Therefore the number 
is 9375. The pencil and paper solution would be, 12500 -f- 1.3333 = 
9375 plus a remainder. This remainder is, of course, due to the use 
of an approximate divisor. The advantage of the common fraction 
solution is obvious. 



Example 8: What per cent of 24 is 3? 
This problem may be worked in two ways, 

a) 3 is | of 24 or 12i per cent of 24. 

b) $ + 24 = .125 or 12 per cent. 

Example 9: What per cent of 8100 is 17415? 

17415-7-8100 = 2.15 or 215 per cent. 

Although the wording of a problem may somewhat obscure the 
case, all per cent calculations can be expressed in one of these three 
forms. With sufficient practice in dealing with per cent problems no 
difficulty should be encountered in determining which of the three 
forms is to be used. 

SQUARE ROOTS 

The five commonly used methods of determining square roots of 
numbers are (a) by inspection, (b) by arithmetic, (c) by the use of 
logarithms, (d) by the use of a table of square roots, and (e) by the 
use of a slide rule. Only the first and second methods will be discussed 
at this time. The use of logarithms is explained in Appendix C. A 
table of square roots has been provided in Appendix D. The use of 
a slide rule can be learned best from the manual of instructions pro- 
vided by the manufacturers of slide rules. 

By Inspection 

The approximate value of the square roots of many numbers can 
be ascertained by a process of mental interpolation, if one has at com- 
mand the values of the square roots of a few numbers or makes use 
of the short-cut method of squaring numbers ending in five. The 
inspection method can be explained easily by the use of an example. 
Suppose the square root of 457 were wanted. Twenty squared is 400 
and twenty-five squared is 625. The square root of 457 is somewhere 
between 20 and 25, but it is obviously closer to 20. The difference 



THE USE OF NUMBERS 23 

between 400 and 625 is 225, 57 is approximately one-fourth of 
this amount, therefore the square root of 457 is approximately 20 + 
i(5) or 21.25. The correct value is 21.38, hence the value by inspec- 
tion is not even correct to one decimal place. 

A variation of the method used in the preceding example will yield 
better results if a calculating machine is available. Suppose the square 
root of 12750 were wanted. The square of 105 is 11025 and the 
square of 110 is 12100 and the square of 115 is 13225. The square 
root of 12750 is between 110 and 115 but is a little closer to 115. 
Therefore square 113 on the calculating machine, securing 12769- 
This is rather close, but the next step, if more accuracy is wanted, is to 
square 112.9. The result 12746 is still closer and further trials will 
give 112.92 as the correct root to five figures. 

The inspection method yields good results quickly after a little 
practice. Even though it is not used as a method of finding the square 
root, it is valuable as a checking device when other methods are em- 
ployed. This is particularly true when roots are found by logarithms, 
the slide rule, or a square root table. 

By Arithmetic Computation 

When no auxiliary devices are available and accurate results are 
required square roots can be found by the following steps. 

Step 1. Divide the number into groups of two digits each way 
from the decimal point. The last group on the left will contain only 
one digit if the number has an odd number of digits to the left of the 
decimal point. 

Step 2. The largest number whose square does not exceed the 
value of the digit or pair of digits in the left-hand group of the number 
is the first figure of the root. This figure is entered above the left 
hand group. 

Step 3. Subtract the square of the first figure of the root from the 
left group of the number. 

Step 4. At the right of the remainder of Step 3, annex the figures 
in the second group of the number. This is the new dividend. 

Step 5. Double the root already found and annex one zero to the 
right as a trial divisor and divide it into the dividend of Step 4 to 
obtain the second figure of the root which is entered over the second 
pair of digits. This new figure will often be too high and must be 
corrected by trial and error, 



24 



BUSINESS STATISTICS 



Step 6. The new figure is added to the trial divisor of Step 5 to 
give the true divisor which is then multiplied by the new figure of 
the root to give a product which is entered under the dividend of 
Step 4. 

Step 7. This product is subtracted from the dividend, the next 
group of digits is annexed on the right of the reminder, and the 
process of Step 5 and Step 6 is repeated. 

Two examples of the use of these seven steps in extracting the 
square root are shown in Figure 1. The examples are constructed 
to emphasize the growth of the solution as the successive steps of 
the process are applied to the examples. The complete solution for 
future use is given in the last computation at the bottom of the Figure. 



FIGURE 1 

Two EXAMPLES OF THE DEVELOPMENT BY SUCCESSIVE STEPS OF THE SOLUTION FOR 

EXTRACTING SQUARE ROOT 



Find the square root of 12750 



Steps 1, 2 nd 3 



I 1 



1 27 50. 
J. 




Find the square root of 4693.49 

LJ 

46 93.49 

56 

10 



Steps 1, 2, 3 
and 4 



Steps 1, 2, 3, 
4 and 5 



Steps 1, 2, 3, 4, 
5 and 6 



I 1 



1 27 50. 
1 



27 
1 1 




I 1 



1 27 50. 
1 



20 



27 



21 I 21 



I 6 



46 93.49 
36 

10 93 

I 6 9 
46 93.49 
36 



120 | 10 93 
I 6 9 



46 93.49 
36 



120 



129 



10 93 



11 



* This product is too large, 
hence the new root should 
be 8, as shown below: 

| 6 8 



46 93.49 
36 



120 10 93 

8 I 
1281 10 24 



THE USE OF NUMBERS 



25 



FIGURE 1 Cont. 

Two EXAMPLES OF THE DEVELOPMENT BY SUCCESSIVE STEPS OF THE SOLUTION FOR 

EXTRACTING SQUARE ROOT 



Steps 1, 2, 3, 4, 
5, 6 and 7 



1112 

1 27 50. 

1 



20 


27 


1 




21 


21 


220 


6 50 


2 




222 


4 44 



I 6 8. $ 

46 93.49 
36 



120 
8 


10 


93 
24 


128 


10 


1360 
5 


69.49 



1365 | 68.25 



Steps 5, 6 and 
7 repeated 



Steps 5, 6 and 
7 repeated 



1 1 1 


2. 


9 


1 


1 

20 " 
1 

21 1 


27 
27 
21 


50 


.00 


220 
2 


6 
4 


50 
44 


222 


2240 
9 
2249 


2 
2 


06 
02 


.00 
41 


1 1 1 


2. 


9 


1 27 

1 


50.00 00 


20 
1 


27 
21 


21 


220 
2 


6 
4 


50 
44 






222 


2240 
9 


2 
2 


06. 
02 


00 
41 


2249 


22580 

1 


3. 
2. 


59 
25 


00 
81 


22581 



| 6 8. 5 Of 



46 93 
36 


.49 


00 


120 
8 
128 


10 93 
10 24 


1360 
5 


69 
68 


.49 
25 


1365 



13700 1 1.24 00 

f 13700 will not go into 
12400, therefore the next fig- 
ure of the root is and the 
next group of figures is 
brought down. 



I 6 8. 5 9 



46 
36 


93.49 
93 
24 


00 00 

00 00 
30 81 


120 10 
8 
128 10 


1360 

5 


69.49 
68.25 


1365 


137000 
9 


1.24 


137009 


1.23 



69 19 



1.33 19 



* The square root of 
12750 is 112.92. (The 
last digit of the root is 
increased to 2 because 
the remainder is more 
than half of the last 
divisor.) 



.'.The square root of 4693.49 
is 68.509 



26 BUSINESS STATISTICS 

The example on the right shows how to find the new figure in the 
root by trial and error in Step 5 and how to deal with a zero in 
the root. There will be one digit to the left of the decimal point in the 
root for every pair of figures to the left of the decimal point in 
the original number. Likewise there will be one digit to the right of the 
decimal point in the root for every pair of figures appearing in or 
added to the original number to the right of the decimal point. The 
last statement is particularly important in taking the square root of 
numbers less than one. 



ACCURACY OF STATISTICAL DATA 

The question of how many figures shall be retained in the result 
of a computation is particularly important in statistical work because 
many of the data employed are to some degree approximate. The 
problem of the statistician can be explained by contrasting his compu- 
tations with those of the bookkeeper who is engaged in keeping a 
record of numerical facts in dollars and cents. The latter must keep 
his records accurate to two decimal places. Suppose the following 
inventory of raw materials was being prepared: 

1,367 ft. 1 in. round iron at $5.25 per 100 ft. $71.77 

11,000 ft. 2\ in. X \ in. strap iron at 

$7.62 per 100 ft. $838.75 

etc. 

The first entry might be carried out to four decimal places, i.e., to 
$71.7675, but the last two places are of no value to the bookkeeper 
who is interested in accuracy only to the nearest cent. Similarly the 
second entry is carried to cents although the 11,000 ft. may be only 
an estimate and even the $838 not entirely accurate. The question 
of how many figures to retain does not arise in either of these cases 
nor does it arise in any case for the bookkeeper. The statistician is not 
in a similar position because more commonly he is dealing with data 
that are not expressed in dollars. Even when he deals with data ex- 
pressed in dollars the question is not likely to be whether they should 
be carried to the nearest cent but whether to use a unit of $100, $1,000, 
or $1,000,000. 

Statistics is often defined as the science of large numbers, and prop- 
erly interpreted this definition is sound, but to many it merely marks 
the statistician as one who works with figures containing six or eight 



THE USE OF NUMBERS 27 

or even ten digits. Nothing could be farther from the facts than this 
impression. True, the statistician deals with aggregates of the magni- 
tude of millions or billions, but a part of his working equipment con- 
sists in the use of well-established rules for rounding off large numbers. 

Rounding OS Numbers 

Meaning. Precision work in a machine shop is seldom more ac- 
curate than to one part in a thousand. If the statistician dealing with 
data concerning the business world can achieve the same degree of 
accuracy as the machinist, the results will be amply satisfactory. Let 
us examine the meaning of data accurate to one part in a thousand. 
The average weekly earnings of factory workers in December, 1936, 
according to the National Industrial Conference Board, was $26.63. 
This figure is an average obtained by dividing total weekly payrolls 
by number of workers employed and means that the result of the divi- 
sion was somewhere between $26.625 and $26.635. Hence a complete 
statement of the figure would be $26.63 .005. That is, a variation 
of as much as .005 may be present in 26.63, or a variation of 5 in 
26,630 which is equivalent to 1 in 5,326. The average weekly earn- 
ings figure quoted to the nearest cent, therefore, is accurate to 1 part 
in 5,000 approximately. 

From this example it will be clear that any figure quoted to four 
digits is accurate to at least 1 part in 2,000, on the assumption, 
of course, that the four quoted figures are accurate. Hence all the 
precision that is needed in statistical work can be provided by maintain- 
ing accuracy to four digits. In the preceding example this requirement 
was met by quoting weekly earnings to the nearest cent, but more gen- 
erally four-digit accuracy will be sufficient regardless of the relation 
of the four digits to the position of the decimal point. 

Significant Figures. In a single number or in the results of a com- 
putation the digits that show the extent to which the figure is accurate 
are called significant figures. Some examples will help in understand- 
ing this definition. The number 98,000,000 has two significant figures 
unless it is known from the surrounding circumstances that the zeros 
are an accurate representation. If the actual amount represented may 
be anything between 97,500,000 and 98,500,000 then only the first 
two figures, 98, are significant and the zeros have no other purpose 
than to show the position of the decimal point. On the other hand 
if it is known that the actual amount lies somewhere between 97,999,- 



28 BUSINESS STATISTICS 

500 and 98,000,500 then the original number would have five signifi- 
cant figures. That is, the first three zeros would be significant while 
the last three would serve the function of showing the position of 
the decimal point. Unless some indication to the contrary is given, 
the final zeros of a whole number are not to be considered significant. 
Likewise in a number less than one, zeros immediately following the 
decimal point are not significant. For example, .00042 has two signifi- 
cant figures, but .000420 has three significant figures because the final 
zero should be taken to mean that the operation was carried to three 
digits and the third one was found to be zero. That is, the actual facts 
are somewhere between .0004195 and .0004205. But .00042 means that 
the actual facts are somewhere between .000415 and .000425. 

The argument of the preceding section can be summarized in terms 
of significant figures as follows: regardless of the absolute size of any 
datum T not more than four significant figures need be retained for 
statistical purposes. 

Method of Rounding Off. When data are expressed to more than 
four significant figures, or more generally whenever a reduction in 
the number of significant figures is desired, methods of rounding off 
must be followed. There is no universally used set of rules for round- 
ing numbers, but a set which has wide acceptance may be stated as 
follows: 

1. When more than five is eliminated the preceding digit should be in- 
creased by one. 

2. When less than five is eliminated the preceding digit should not be 
changed. 

3. When exactly five is eliminated the preceding digit should be increased 
by one if it is an odd number but should not be changed if it is an 
even number. 

Examples: 

GIVEN NUMBER ROUNDED NUUBEK 

1267862 1268 

8762180 8762 

5863500 5864 

5862500 5862 

5862517 5863 

Sometimes a number rounded to four significant figures is subse- 
quently rounded to three significant figures. This can be done by apply- 

T The word "data" should be used in a plural construction. The singular is "datum" 
referring to a single item or figure as used here. 



THE USE OF NUMBERS 29 

ing the same rules, except for a number such as 467465. Rounded to 
four significant figures the number becomes 4675 and subsequently 
rounded to three significant figures according to Rule 3 it becomes 468, 
but obviously the result of rounding to three significant figures should 
be 467. This case is covered by an auxiliary rule sometimes followed 
by computers: "If a number when rounded upward ends in an even 
5, indicate that fact by a prime (')." According to this rule the 
four-significant-figure result for the example would be written 4675' 
to indicate that in any subsequent rounding to three significant figures 
the third digit should not be increased to 8. 

The foregoing is a statement of the mechanics of rounding off 
numbers. However, the statistician's problem does not end here, 
because in many cases the data with which he must work will not be 
accurate to four significant figures and usually the degree of accuracy 
is not stated. In such cases formal rules must be supplemented by a 
knowledge of the background of particular data. We proceed, there- 
fore, to a detailed description of the kinds of figures which appear in 
statistical work and the basis for judging their accuracy. 

Counting and Measurement 

In statistical work enumerations of two kinds appear: (1) those in 
which the units are counted, and (2) those in which the units are 
measured. For example, the value of exports of 102 countries in 1935 
as reported by the United States Department of Commerce was 
$11,580,000,000. The number of countries included in the report was 
an exact count and any computations based upon it would not be 
subject to error. The value of exports was obtained by totaling the 
reports of customs of the several countries after converting the different 
monetary units to dollars, using some agreed upon set of exchange 
ratios. Due to inaccuracies of reporting within individual countries, 
variations in methods of valuing exports and the complication of 
applying exchange ratios between different monetary units, the figure 
for value of exports is at best only an approximate measurement 

Cases similar to both of these appear in statistical work. Units 
which are counted give rise to little or no difficulty in subsequent work. 
They may be accurate to five or six or more significant figures but not 
more than four need be retained in statistical work. On the other 
hand units which are measured immediately lead to the question: 



30 



BUSINESS STATISTICS 



How accurate are the results? Sometimes this question is answered 
specifically. 

For example, the United States Department of Agriculture defines 
No. 1 Soft Red Winter Wheat as follows: 

Minimum test weight per bushel 60 IBs. 

(Scales accurate to one-tenth of a pound, hence the wheat must 
weigh more than 59.9 pounds to be No. 1 grade.) 

Maximum limits of 

Damaged kernels 2% 

Foreign material 1 % 

Wheat of other classes 5% 

(All of these are taken to the nearest per cent when tests are 
made.) 8 

Another example which states the margin of error is presented in 
Table 4. 

TABLE 4 

LONG-TERM PRIVATE DEBT: ESTIMATED AMOUNTS OUTSTANDING FOR 1912, BY CLASSES* 
(Amount of debt in billions and tenths of billions of dollars) 



CLASS or DEBT 


ESTIMATED 
DEBT 


PER CENT 
DISTRIBUTION 


MARGIN OP 
ERROR 
(per cent) 


Total 


31.3 


100.0 


10 


Railway 


10.7 


34.2 




Public utility 


5 3 


168 


5 


Industrial 


4.5 


14.4 


10 


Farm mortgage 


3.8 


12.2 


4 


Urban real estate 


7.0 


22.4 


15 



Non. The margins of error shown in the table for private debt represent a non-statistical evalu- 
ation of the figures by the estimator. 

Statistical Abstract, 1936, United States Department of Commerce, Bureau of Foreign and 
Domestic Commerce, p. 273. 

Examples of Measurement 

More commonly the error to be expected is not indicated. Thus 
the user of the data is left to judge the degree of accuracy which can 
properly be attributed to them. Judgments of this sort must be based 
upon a knowledge of the method used in obtaining the data and a 
background of information concerning the source. For example, the 
United States Department of Agriculture announces in December the 
estimated crop of winter wheat for the year. The estimate as of 
December 1, 1937, was 873,993,000 bushels. The department receives 
annually about 160,000 reports from farmers in all sections of the 

8 Handbook of Official Grain Standard* of the United States, United States Depart- 
ment of Agriculture, Bureau of Agricultural Economics, revised June, 1937. 



THE USE OF NUMBERS 31 

country; these include estimated acreage of wheat planted and esti- 
mated yield per acre. The two estimates are multiplied together to 
give approximate production in each locality. These approximate 
production figures are then weighted according to the probable total 
production which each represents, and combined. The result is an 
estimate of the production of wheat in the entire country. The process 
is actually much more refined than this incomplete statement would 
imply. The results obtained, while not entirely accurate, usually prove 
to be within 2 or 3 per cent of the actual production recorded in 
agricultural censuses. 

An example of a different sort is the monthly report of floor space 
in new buildings contracted for, as compiled by the F. W. Dodge 
Corporation. The totals for thirty-seven states of the United States 
east of the Rocky Mountains are aggregates of the reports of local 
offices in all sections of this area, supplemented by reports from cor- 
respondents. The floor space in a building is estimated from the plans 
used in letting the contract for the building. These estimates are not 
intended to give the exact number of square feet of floor space; they 
may vary as much as 10 per cent from the actual area. When many 
such estimates are combined the underestimates tend to balance the 
overestimates so that the aggregate figure may be much more nearly 
correct than the individual estimates. However, in reporting 
non-residential construction contracts for January, 1938, as 9,637,000 
square feet, an error of as much as 200,000 square feet might easily 
be present. 

A third example is the report by the Bureau of Internal Revenue of 
the Treasury Department on net income of corporations for a year. 
The aggregate net income of all reporting corporations is carefully 
compiled from the bureau's files, hence the 1934 income of $596,- 
048,000 is considerably more accurate than the figures of either of 
the preceding examples. 

These examples indicate the extent to which a background of 
knowledge of methods of collection is necessary in understanding the 
accuracy of data. The figures in the three illustrations are not equally 
accurate. The crop estimate may easily be in error by as much as 1C 
million bushels, hence not more than two significant figures of the 
estimate are accurate and it might as well be stated as 870 million 
bushels. There is false accuracy in stating this estimate to the nearest 
thousand bushels, because comparisons with the quinquennial census 



32 BUSINESS STATISTICS 

of agriculture show that the estimates usually differ by several million 
bushels. False accuracy is common in published data but it causes 
little difficulty so long as the background of the data is sufficiently 
familiar for users to be aware that more significant figures have been 
retained than is warranted. 

The same argument applies to the figure for floor space of con- 
struction contracts. It might better be stated as 9 million square feet, 9 
since a variation of as much as 200,000 square feet is inherent in the 
method of collecting the data. On the other hand the figure for cor- 
poration income tax is accurate to six significant figures, but there is 
no need to retain more than four significant figures; hence the figure 
should be written as 596.0 million dollars. The zero following the 
decimal point is written just as a digit other than zero would be written 
to show data accurate to the nearest hundred thousand dollars. 

Significant Figures in Computation 

The emphasis up to this point has been on the number of significant 
digits to retain in a single figure or a list of figures pertaining to a 
single subject. We are now ready to develop methods of dealing with 
rounded numbers in performing computations. The rules applicable 
to each of the four fundamental operations will be explained in order. 10 

In Addition. Each of the examples in Table 5 illustrates a par- 
ticular point in dealing with approximate numbers. In Example A 
exports from each division are given to the nearest hundred thousand 
dollars. This is done because the data are no more accurate than to 
that unit and because no greater accuracy is needed in statistical work. 
In the total there is no reason for retaining the fifth significant figure, 
and total value of 'exports for 1934 may be stated as 2,133 million 
dollars. 

In Example B the operating revenues have been carried to the near- 
est dollar. The data are perfectly accurate since they come from audited 
statements submitted to the Interstate Commerce Commission by the 

9 When the size of the unit in which data are expressed is increased the change should 
be from single units to thousands or millions or billions rather than to intermediate sized 
units. Thus 12,416,736 could be stated as 12,417 thousands if it were accurate to five 
digits, as 12.42 millions if it were accurate to four digits, and as 12.4 millions if it 
were accurate to three digits. The expression of the last two examples in the form 
1,242 ten thousands and 124 hundred thousands, respectively, must be frowned upon 
because of the potential confusion in the minds of students as to the number of zeros 
to be added, if one wishes to return to the original unit. 

10 These rules are not applicable to bookkeeping where accuracy must be maintained 
to the nearest cent regardless of the number of significant digits retained in a particulaf 



THE USE OF NUMBERS 33 

TABLE 3 

EXAMPLES OF ROUNDING OFF MEASUREMENTS IN ADDITION 

A B 

VALUE OF UNITED STATES EXPORTS OPERATING REVENUES OF CLASS I RAIL- 
OF MERCHANDISE BY COAST AND ROADS OF THE UNITED STATES 

BORDER DIVISIONS, 1934 * BY SOURCE, 1934 * 

VALU OF EXPORTS 

(in millions and 
Diviiioif tenths of millions) 

North Atlantic $ 810.8 SOURCI REVENU. 

South Atlantic 207.3 Freight $2,629,301,525 

Gulf Coast 509.9 Passenger 345,889,550 

Mexican Border 48.0 Mail 91,139,847 

Pacific Coast 259.8 Express 54,013,025 

Northern Border 297.5 Other 151,222,875 

Total $2,133.3 Total $3,271,566,822 

Rounded total $2,133. Rounded total $3,272,000,000 



AREA OF LAND IN THE UNITED STATES FOR WHICH TITLE REMAINED WITH THB 
GOVERNMENT ON JUNE 30, 1935 * 

UlK NUMIO 

OF ACHES 

National forests 138,710,942 

National parks and monuments 8,724,737 

Indian reservations (estimated net) 57,518,590 

Military, naval, experimental reservations, etc. 

(approximate) 1,000,000 

Unappropriated, but withdrawn (approximate) .197,261,754 

Total 403,216,023 

Rounded total 403,000,000 

* World Almanac, 1936. 

railroads. There is, however, no advantage in retaining ten significant 
figures in statistical work. According to the rule the operating revenue 
may be stated as 3,272 millions of dollars or the figure may be carried 
to dollars rounded off to the nearest million as shown in the table. 

Example C differs from A and B in that part of the data are 
approximations. The area of the military, naval, and other reservations 
is estimated at 1,000,000 acres without any attempt to be more accurate. 
The areas of the Indian reservations and the unappropriated lands are 
likewise only approximate, yet the figures are given to an acre. There 
is false accuracy in these two figures, and an inconsistency in the table. 
The result should not be carried beyond the limit of the least accurate 
figure which appears to be millions of acres. The total, therefore, 
should be stated to only three significant figures. 

The conclusion to be drawn from these examples is that usually no 
more than four significant figures are to be retained in a sum, and when 
the data are not accurate to four digits fewer should be retained to 
avoid introducing false accuracy in the result. It is not to be implied 
that all cases which arise will conform to these three examples. On 



34 BUSINESS STATISTICS 

the other hand study of these examples will provide guidance in the 
selection of the proper number of significant figures to retain in any 
set of data. 

In Subtraction. The rules for subtraction are the same as those for 
addition. For example, a method which is commonly used in measur- 
ing the number of automobiles withdrawn from service each year 
includes these steps. 

Number of automobiles registered, 1933 23,843,591 

Number of automobiles produced for the domestic market, 

1934 2,442,389 

Number which could have been registered, 1934 26,286,180 

Number actually registered, 1934 24,933.403 

Number withdrawn from service, 1934 1,352,777 

This method is not particularly accurate as a measure of cars 
"scrapped" because all second-hand cars taken in by dealers and not yet 
resold as well as cars which are temporarily unlicensed by their owners 
are included in the 1,352,777. At the moment, however, the rounding 
off of the figures is the point of interest. In spite of the fact that these 
figures imply an accurate count of automobiles, they are really subject 
to a substantial margin of error. The exact amount of error is unknown, 
but no inconvenience follows because there is no advantage in retaining 
more than four significant figures. The result would therefore be stated 
as 1,353,000 automobiles withdrawn from use during 1934. There is 
no certainty that these data are accurate even to the nearest thousand, 
but they would be assumed to be accurate to that extent unless definite 
information to the contrary were at hand. 

In Multiplication. The rule of four significant figures holds for 
multiplication just as in the preceding operations, but an additional 
rule must also be observed. The product of two measurement numbers 
must not be retained to more significant figures than the least number 
of significant figures in either the multiplier or multiplicand. For ex- 
ample, during May, 1936, 2,648,330 long tons of pig iron were 
produced in the United States. At that time the Composite Pig Iron 
Price was $19.96 per long ton. The value of the month's production 
was 2,648,330 X $19.96 = $52,860,666.80. But the data for pig-iron 
production are approximate to an unknown extent and the composite 
price is an average which is accurate only to the nearest cent. Assuming 
that the production figure is accurate to the nearest hundred tons would 
give five significant figures, but there are only four significant figures 
in the price. Therefore, the value of production should be stated to 



THE USE OF NUMBERS 35 

only tour significant figures and could be written 52.86 million dollars. 
If the figure were written complete, it should be $52,860,000. The 
reason for this approximation will be apparent from the two following 
computations which show the maximum and minimum values which 
this product may take when the last significant figure of each number 
is given its maximum and minimum value. 

A B 

MINIMUM MAXIMUM 

2,648,250 2,648,350 

19.955 19.965 



52,845,829 52,874,308 

These products differ in the fourth significant figure. It is therefore 
apparent that nothing beyond the fourth figure is of any value and 
indeed that the fourth figure is not exact, although sufficiently accurate 
for statistical work. 11 

The rule for multiplying also includes the special case of squaring a 
measurement number. The significant digits of the square should not 
exceed the significant digits of the original number. For example, the 
square of the measurement 26.85 should be retained as 720.9. 

In Division. The rule for division is: There should be no more sig- 
nificant figures in the quotient than the least number which appears in 
either the dividend or the divisor. The general rule of not more than 
four significant figures in a statistical calculation also applies. For 
example, the December 1, 1936, final estimate of the United States 
cotton crop for 1936 according to the Department of Agriculture was 
12,399,000 bales. The estimate of acreage harvested was 30,028,000 
acres. The average yield per acre is obtained by dividing the produc- 
tion by the acreage, i.e., ' 12,399,000 -^ 30,028,000 = .41291 bales 
per acre. The result may be carried to five digits according to the 
rule that the significant figures retained in the quotient should not 
exceed the number of significant figures in either dividend or divisor, 
whichever is smaller. However, statistical work usually requires keep- 
ing only four significant figures. Rounding off to four places, then, 
the production is .4129 bales per acre. Actually the result would be 
expressed in pounds by multiplying the average in bales (.4129) by 
500. The average yield in pounds would be 206.5 per acre. 

11 George G. Chambers, in An Introduction to Statistical Analysis (F. S. Crofts and 
Co., New York, 1925), would not retain the fourth significant figure. His rule is "if the 
product of two single number approximations is expressed as a single number approxima- 
tion the integer [significant figures] of the product is less than the integer [significant 
figures] of the least accurate factor," p. 27. 



36 BUSINESS STATISTICS 

Suppose that the acreage harvested were not considered accurate to 
the nearest 1,000 acres, but were reported as 30.0 million acres 
accurate only to the nearest 100,000 acres. The average yield per acre 
would then be 12,399,000 -f- 30,000,000 = .413 bales. The result 
can be carried to no more than three significant figures because only 
three are significant in the divisor. The reason for retaining only three 
figures will be apparent, if the divisions are made giving dividend and 
divisor their minimum and maximum values. 

A B 

MINIMUM MAXIMUM 

12398,300 _ 12,399,300 _ 

30,050,000 ~~ ' 29,950,000 """ ' 

The fourth figure has no significance whatever since the two quotients 
do not agree even, in the third figure. As stated in the discussion of 
multiplication, the third figure is somewhat approximate but is accurate 
enough for use in statistical work. 12 

In Extracting Square Roots. The reverse of the rule for squaring 
numbers holds for square roots. That is, as many significant figures 
may be retained in the root of a measurement number as there are in 
the number. Hence \/327 = 18.1. 

SUMMARY 

The purpose of this chapter was stated as an attempt to set forth 
in elementary form a background of computation methods which would 
facilitate the work of subsequent chapters. The first part is devoted 
to a review of arithmetic processes while the second deals with the 
rounding off of figures for statistical purposes and the rules for 
determining the number of significant figures to be retained. The 
material presented is, of course, germane to all operations with num- 
bers, but is particularly useful in statistical computation. 

The primary task of the statistician is not, however, to make of 
himself a "figuring fool." The most important task is mastery of the 
techniques which will be developed in the chapters which follow. 
Ability to compute rapidly and accurately must be considered as the 
necessary background for, but not the main object of, statistical work. 

12 Perhaps attention should be directed again to the meaning of the expression "good 
enough for statistical work." The figure .413 bales per acre appearing in print should 
be taken to mean not less than .4125 and not more than .4135. The variation in either 
direction is .0005 on .413 or 5 on 4130 which is a variation of one part in 826 and 
this is accurate enough for statistical work. 



THE USE OF NUMBERS 
PROBLEMS 



37 



Problems 1-10 are self-tests in which the student can check his own per- 
formance against the standard time listed. Do not write answers in the book 
in order that additional trials can be made if the first one fails to meet the 
time limit. 



1. 


Addition (40 seconds) 








943 167 4956 


2286 


6269 




376 742 6237 


7463 


4728 




641 969 312 


9498 


8247 




879 378 8468 


4537 
. v/ 


3722 


2. 


Addition (3 minutes) 


*W 


/^'> 




876 24.29 476,876 31.35 


.4832 


1371.10 




937 15.15 377,139 42.50 


.1887 


1229A8 




711 41.69 991,387 1.46 


.0942 


782.20 




492 39.63 398,872 23.59 


.3948 


59.35 




321 23.15 814,612 5.62 


.0038 


2892.36 




173 ' 15.12 329,388 7.35 


.1850 


755.73 




288 4.28 376,441 66.75 


.3763 


3842.45 




317 16.14 114,473 103.43 


.0382 


4721.21 




222 34.99 787,224 35.78 


.0976 


2783.29 




384 55.29 716,3,26 7.11 

^i (j ^i * C 1 \* ^\ ^^* Mi| 


.1956 


1972.48 


3. 


'OlT /v-i'^ -> i,-v^ 7 , 

Subtraction (35 seconds) ^ iM' * 


i V^ M 


& o\ : tf 




1090 8617 31.762 217.32 


$27,218.45 


$586.89 




585 7758 4.86 29.685 


11,216.10 


497.98 


4. 


Multiplication (1 minute, 30 seconds) 








921 875 726 486 


1269 


8296 




23 19 68 35 


137 


864 


5. 


Multiplication (2 minutes, 30 seconds) 






3.8125 34.4167 .2976 620.14 


8.875 21.72 1.093 7.963 


6. 


Division (1 minute, 15 seconds) 








237) 50481 593) 28464 


418) 240768 


7. 


Division (carry to four significant figures) 


(3 minutes, 


30 seconds) 




29.57) 128.43 .2448) 107.321 


224.08) 


3.11417 


8. 


Multiplication by short-cut methods (1 minute, 15 seconds) 




793 X 25 


65X65 






2641X33* 


93 X 107 






732 X 199 


47X47 





38 BUSINESS STATISTICS 

9. Multiplication by short-cut methods (2 minutes) 

2.183 X .875 892 X 908 

48027X901 81X81 

115 X 115 176 X 176 

10. Division by short-cut methods (1 minute) 

5418212 -r- 25 
83.47 -*- .05 
.4983 -5- 12.5 

11. Find the value of 

a) (28 X 37) + (12 X 16) - (31 X 29) 

*) 3 + (9 X 36) - (22 X 11) + 486 + (138 ~ 6) 

1417-12(16+9(21-8)} 

<0 (86 X 22) + 44 + (98 + 7) X 210 - 432 ~ (12 X 12) 

1217[81 X {5952 +- (31 X 32) + 4} - 900] 

12. Find the value of 

*) i + i+i+yV 

*) i - y + 7 

3f+10A-6ii 

13. A can do a piece of work in 15 days, B can do the work in 18 days, and 
C in 25 days. What fraction of the work will the three working together 
perform in six days? 

14. In Problem 13, after A has worked four days and C six days, what is the 
difference in the fraction of the work performed by the two? 

15. 12^X1055 -f-649J-= ? 

16. Mr. Smith invested $27,500 in a partnership having a total capital of 
$200,000. He later sold J of his holding in the concern. What fraction 
of the ownership of the concern did he retain? What fraction did he sell? 
What amount should Mr. Smith receive of a $12,000 profit, (1) origi- 
nally, (2) after selling 1 of his equity? 

17. If 16 items are to be plotted at equal distances and centered on a sheet 
of graph paper 9} inches wide and the space allotted to each item must 
be a multiple of \ inch, how much space will be left for margins at each 
side of the paper? 

18. If 14-j tons of coal cost $122-^ what was the cost per ton? 

19. Express the following as (a) decimals, (b) per cents: ^ ff> -^ r \ T , -f^, 3g, 
13 A- 

20. Express the following as (a) common fractions, (b) per cents: .06, .003, 
.004167, .65, 3.1875. 



THE USE OF NUMBERS 



39 



10 per cent, 



J.AJLL4 UvJJJ V/JL XX Wi.VXUJJ.XXO 

21. Express the following as (*) common fractions, () decimals: 
6 per cent, 18f per cent, per cent, 262 per cent. 

22. Arrange each of the following in ascending order of value: 

O) -43, TV, 37.5 per cent, .4, lo 

(#) i per cent, .086, sir, iro per cent, ToVo 

23. The spoilage on two crates of oranges each containing 210 oranges was 
17 per cent and 33 per cent, respectively. Find the income from sale of 
the unspoiled oranges at 45 cents per dozen. 

24. Assessments in a city are maintained at 70 per cent of market value. If 
Mr. White pays $302.40 tax when the tax rate is $30 per thousand of 
assessed valuation, what is the market value of his property? 

25. The balance sheet of a concern showed the following: 

Cash $ 7,500 

Accounts receivable 38,500 

Inventory 15,750 

Investments 9,050 

Plant 120,000 

Equipment 69,200 

Total assets $260,000 

Each type of asset was what per cent of the total? 

26. An article cost $450. At what price should it be sold (a) to make a profit 
of 55 per cent on cost, (b) to make a profit of 45 per cent on the selling 
price ? 

27. If a worker's wages are cut 25 per cent and subsequently increased 25 per 
cent, the most recent wage is what per cent of the original wage? 

28. Given the following: 



MONTH 


GROCERY 
SALES 


No. OF DAYS 
STORE WAS OPEN 


July 


$28,412 


24 


>i' 
August 


29.827 


26 



Find the per cent of change in average daily sales in August compared with 
July. 

29. The following information is available concerning the manufacture of a 
particular article: 





1939 


1940 


No of units produced 


200 000 


275 000 


Overhead costs 


$50 000 


$50 000 


Variable costs 


$100,000 


$120,000 


Sales income 


$200.000 


275,000 



a) The per cent of profit on selling price increased by what per cent 
in 1940? 



40 BUSINESS STATISTICS 

b) The per cent of profit on cost increased by what per cent in 1940? 

c) The overhead per unit was what per cent of the selling price per unit 
in 1939? in 1940? 

d) The variable cost per unit was what per cent of the selling price per 
unit in 1939? in 1940? 

e) What discount on selling price could the manufacturer have offered in 
1940 and still have maintained the same rate of profit as in 1939? 

30. Find the square root of each of the following by arithmetic process and 
check the result in a table of square roots or by logarithms. 

a) 360046 c) 9.62048 

b) 65.604 d) 12089.37 

31. What is the degree of accuracy of each of the following measurement num- 
bers? Express the answer as a common fraction and as so many per 1,000 
or per 10,000 whichever is preferable: 

(*) 67 (d) 4208 

(b) 18.2 (e) 508.0 

(c) 4200 (f) .0007 

32. How many significant figures are there in each number of Problem 31? 

33. a) Round each of the following numbers to four significant figures. 
b) Round each of the following numbers to three significant figures 

(1) 787428 (5) 9989.47 

(2) 13004 (6) 695.451 

(3) 27.998 (7) 164850 

(4) 4055.5 (8) 28.9950 

34. Which of the following are counting numbers and which are measurement 
numbers ? 

a) The three plots which Mr. Jones purchased were, respectively, 40 ft. 
X 120 ft, 100 ft. X 150 ft., and 20 ft. X 2 50 ft. The total area was, 
therefore, .57 acres. 

b) The 65 persons who were on the payroll sometime during the year were 
the equivalent of 43 full-time workers and the total payroll was 
$62,712.85; hence the average annual wage per equivalent full-time 
worker was $1,458. 

35. How many figures would you expect to be accurate in each of the following? 
Give reasons for your answer in each case. All of the examples were taken 
from the Statistical Abstract of the United States, 1938. 

a) The population of the United States was enumerated in 1930 as 
122,775,046 persons. 

b) The population of the United States was estimated by the Bureau of 
Census in 1938 as 130,215,000 persons. 

c) The Office of Education of the Department of Interior reports the 
enrollment in colleges, universities, and professional schools in 1936 
as 1,062,760 students. 



THE USE OF NUMBERS 41 

d) The total assets of all member banks of the Federal Reserve System 
on the December 31, 1937, call date were $46,785,512,000. 

e) The Bureau of Foreign and Domestic Commerce of the Department 
of Commerce estimates from a sample collection that the total retail 
trade of the United States in 1937 amounted to $39,930,000,000. 

In each of the following problems express the summary figures to the correct 
number of significant digits. 

36. 

A 

LIABILITIES OF FEDERAL INTERMEDIATE CREDIT BANKS, 
DECEMBER 31, 1937 

VALUE 

LIABILITY (thousands of dollari ) 

Paid in capital and surplus, United States government 100,000 

Surplus, earned reserves and undivided profits* 12,561 

Debentures outstanding (unmatured) t 174,950 

Total 287,511 

*Net amount after deducting impairment or deficit. 

tAdjusted for debentures held by banks of issue and by other federal intermediate 
credit banks. 

B 

PRODUCTION, TRADE, AND SUPPLY AVAILABLE FOR CONSUMPTION OF RAW SUGAR, 
CONTINENTAL UNITED STATES, 1935 

QUANTITY 
ITEM (Aort font) 

Production (beet and cane only) 1,651,000 

Brought in from insular areas 2,686,969 

Imports as sugar 2,372,066 

Exports as sugar 103,349 

Exports in other forms 13,220 

Available for consumption 6,593,466 

37. (a) In the following table, how many significant figures should be retained 
in the total consumption? (b) Assuming the accuracy of a population of 
129,257,000 in 1937, what is the per capita consumption of meats in the 
United States? 

PRODUCTION, FOREIGN TRADE AND CONSUMPTION OF ALL MEATS 
IN THE UNITED STATES, 1937 

AMOUNT 
ITEM (million pounds) 

Production 

Federally inspected 10,273 

Uninspected (estimated) 5,299 

Exports of United States production 164 

Imports for consumption 263 

Net change in storage stocks, decrease 402 

Consumption 16,073 

38. Find the value of a corn crop estimated at 4,000 bushels, if it was sold for 
87 cents per bushel. 

39. A motorist drove 3,532 miles from Boston to San Francisco, using 207^ 
gallons of gasoline. What was his average mileage per gallon? 



42 BUSINESS STATISTICS 

REFERENCES 

CHAMBERS, GEORGE G., An Introduction to Statistical Analysis. New York: 
F. S. Crofts and Co., 1925. 

Chapters I and II deal with measurement, approximation and significant 
digits. 

EDGERTON, EDWARD I., and BARTHOLOMEW, WALLACE E., Business Mathe- 
matics. New York: The Ronald Press Co., 1923. 

Chapter XII explains short methods of computation and various checks of 
accuracy. 

LANGER, CHARLES H., and GILL, T. B., Mathematics of Accounting and Finance. 
Chicago: Walton School of Commerce, 1930. 

The first five chapters contain a detailed statement of the fundamentals of 
arithmetic and algebra. Short cuts are presented in pp. 43-64. 

MURPHY, PATRICK, Short Practical Rules for Commercial Calculations. Albany : 
Weed-Parsons Printing Co., 1910 (originally printed in 1886). 

A highly stimulating presentation of short-cut methods for the student who 
cares to pursue the subject at length. 

WALKER, HELEN M., Mathematics Essential for Elementary Statistics. New 
York: Henry Holt and Co., 1934. 

Chapters I- VI contain material similar to part of the text. Chapter II on 
significant figures is particularly pertinent. 



CHAPTER III 
STATISTICAL INVESTIGATION 

THE CHARACTER OF STATISTICAL INVESTIGATION 

THE EXTENT to which the work of the statistician underlies the 
conduct of business affairs was discussed in chapter I. Some- 
times the contribution which he makes is relatively simple, 
being confined merely to presenting sales figures graphically. On the 
other hand his task may consist in a study of sales records and indexes 
of regional purchasing power for the purpose of determining sales 
territories and establishing sales quotas, or the study of a sample of 
output to determine whether it meets contract specifications. Whatever 
the complexity of a particular problem, the sequence of steps followed 
in its solution involves the application of statistical method. 

Definition of Statistical Method 

The statistical method is essentially the use of the principles of 
scientific investigation in the study of aggregates of numerical infor- 
mation. Just as the physicist must develop laboratory methods and 
techniques for examining the theories of sound, light, etc., so the 
statistician must have methods of appraising the theories of proba- 
bility and sampling in terms of the observed phenomena (numerical 
data) of the business world. The problem of the statistician is com- 
plicated considerably by the fact that business operations cannot be 
subjected to the control that is possible in the physics laboratory. As a 
result the methods of statistical investigation are those research pro- 
cedures developed to meet the peculiar requirements of the problems 
arising in the conduct of business affairs. 

An example will demonstrate the difference between the controlled 
conditions of the physics laboratory and the uncontrolled conditions 
of the statistics "laboratory." The physicist wishing to read the height 
of a column of mercury in a manometer tube sets up his apparatus, 
provides for a constant temperature in his laboratory, selects a time 
at which barometric pressure is stable, and proceeds to take a large 
number of readings on the scale attached to his apparatus. The aver- 

43 



44 BUSINESS STATISTICS 

age of a large number of such readings will be the theoretically best 
value for the height of the column of mercury in the tube. In contrast 
to this, suppose that a statistician wishes to determine the per cent of 
the employable workers of the United States who are unemployed as 
of a given date. He might also elect to take a large number of inde- 
pendent readings of the phenomenon under investigation and take the 
average of the results as the best value for the per cent of workers 
unemployed. But he encounters a whole mass of preliminary problems 
before any observations can be made and none of these can be "con- 
trolled." He must define "unemployed person," "employed person," 
"employable person," and no matter how carefully these definitions 
are phrased doubtful cases will arise. He must determine how to 
select samples of the population which will be representative, and even 
the most meticulous care will not produce a result comparable with the 
stability of the column of mercury in the physicist's laboratory. These 
and similar problems have forced the statistician to develop methods 
of investigation which are peculiar to the type of data with which he 
deals and the uncontrolled conditions under which he must use them. 
The employment of statistical methods in the solution of business 
problems belongs almost exclusively to the twentieth century. At an 
earlier date when business enterprises were small, management was 
able to comprehend its problems in detail by personal contact. The 
increased size of concerns in the present period has required more 
planning and greater regimentation of operations. At the same time 
management has found it impossible to maintain personal contact with 
its problems. The alternative is control through the interpretation of 
numerical information. This chain of circumstances has led to the 
introduction of statistical methods of investigation as a primary aid 
in the performance of the function of management. 

The Use of Statistical Method 

Masses of Data. The methods used by life insurance actuaries 
give no information concerning the time at which a particular insured 
person will die, but they give very accurate information concerning 
the number of persons who are likely to die in any year out of a 
large number of a given age alive at the beginning of that year. 
Life insurance premiums are based upon the regularity of death rates 
among large groups of persons, not on a guess as to how long an indi- 
vidual will survive. Similarly, a study of department-store experience 



STATISTICAL INVESTIGATION 43 

may show that bad debt losses on charge accounts amount on the aver- 
age to about 1 per cent of charge sales. It does not follow that an 
individual store must have bad debt losses of 1 per cent. This result 
can be applied to particular cases only by taking account of the relation 
of conditions in the individual case to the average conditions found 
in the large group. The individual store may have 2 per cent bad debt 
losses in a certain year due to the fact that its customers have been 
experiencing the effects of a great amount of unemployment. In an- 
other year when its customers are fully employed the same store may 
have only of 1 per cent bad debt losses. Another example is the 
use of income tax statistics in the determination of sales quotas. 
Studies show that the higher the percentage of the population of a 
state filing income tax returns, the higher the percentage of the popu- 
lation purchasing automobiles. This relationship can be used to estab- 
lish sales quotas for automobile agencies in the various states. It does 
not follow, of course, that those persons who file income tax reports 
will necessarily purchase automobiles, but that the higher purchasing 
power evidenced by the larger percentage filing tax reports will be 
available for the purchase of automobiles. Hence intensified sales effort 
where the purchasing power exists should produce the best sales 
results. 

These three examples show how management uses the results 
obtained from the study of mass information. The typical situation 
found in the group is used as a guide for action within individual 
concerns. 

Case Investigations. There is, however, one type of statistical work 
known as the case method which does not deal with masses of data. 
An individual case is studied intensively, usually over a period of time, 
in order to make a complete analysis of its operations. The case may 
be one individual, a single family, a business concern, or any other 
similar entity. 

In statistical work case investigations are of less frequent occur- 
rence than mass data investigations. More often than not case studies 
eschew statistical method entirely and rely solely on historical descrip- 
tion in the presentation of results. A case study is characterized by the 
establishing of such a strong personal relation between the investigator 
and the person or persons furnishing information that a vast amount 
of detailed information can be obtained concerning the case. In pre- 
senting the results, each case is written up separately and represents a 



46 BUSINESS STATISTICS 

complete investigation in itself. The distinguishing feature of the case 
method is the fact that a detailed description of the individual case is 
the objective. 

The records maintained by physicians concerning their patients 
become most complete life-histories, sometimes covering the entire 
span from the cradle to the grave. These records are, of course, con- 
fidential, but their anonymous publication would provide a remark- 
able background for the study of sickness and health problems. In a 
similar fashion a file for a period of years of a financial manual such 
as Moody's Manual of Investments, giving as it does a brief case his- 
tory of many individual corporations, becomes a compendium of invalu- 
able case records of the founding, growth, financial organization, and 
in some instances the decline of individual concerns. These are avail- 
able for study either as individual cases or collectively as the raw 
material for statistical analysis. 

Case study is used infrequently in the investigation of business prob- 
lems. It is a method that is well adapted to studies of social phenomena 
and has been widely used in the field of social work. As such it lies 
outside the scope of this book. 

THE CANONS OF STATISTICAL,' -INVESTIGATION 

The attitude of the statistician toward his work is a matter of con- 
siderable importance. His methods are equivalent in the field of social 
science to those employed in the exact sciences by the chemist, physicist, 
and biologist and his attitude toward his work must be equally 
scientific. Under no circumstances can he become an advocate or a 
special pleader. Statistical work done for purposes of pleading does 
not deserve the name of scientific research. 

As a means of promoting the scientific character of statistical inves- 
tigation there are certain standards or requirements which should be 
uniformly maintained. These fall naturally under three heads, each 
of which requires detailed explanation. 

Definite Object 

Statistical investigation is never aimless. It is always directed to 
the solution of a specific problem. The problem may be as basic as 
finding the total annual income of the nation or as circumscribed as 
a study of the amount of flour hauled on the New York State Barge 



STATISTICAL INVESTIGATION 47 

Canal during September, 1937. But regardless of scope the purpose 
must be specifically defined. Unless this requirement is met, direction 
will be lacking in the investigation, unnecessary work will be done 
and results of questionable value will be obtained. It is essential there- 
fore to have the exact object of an investigation fully understood before 
any other work in connection with it is undertaken. At all subsequent 
stages of the investigation the purpose must be kept in mind as a 
guide in the planning and execution of the project. 

Unbiased Attitude 

The statistical investigator sets out to determine by investigation the 
facts concerning a given problem, but not to prove a certain thesis. 
There are times when it is very difficult to maintain an unbiased atti- 
tude. Some questions are of such controversial character that even 
the most detached investigator finds himself influenced. On the other 
hand in reading the report of an investigation one frequently has a 
feeling that the author has "leaned backward" to avoid bias. This is 
the proper attitude of a careful investigator when he finds himself 
placed at the center of a controversy. 

Conscious or unconscious bias may appear in statistical work. Con- 
scious bias can be dismissed quickly. A person who willingly distorts 
statistics for the purpose of proving a preconceived idea should not be 
called a statistician. He is a propagandist. It is necessary to be vigilant 
at all times to avoid using results containing bias. Conscious bias may 
appear in one or several of the following forms: (1) direct misstate- 
ment, (2) ambiguous statement, (3) the use of only favorable data, 
(4) concealed shifting of units of measurement, (5) deliberate selec- 
tion of incorrect techniques, and (6) misleading forms of presentation. 

Careful study is usually required to detect unconscious bias. Perhaps 
it would be safe to assume that all statistical interpretations contain 
some bias but that in most cases it is not present to a harmful degree. 
This is only another way of saying that the results of statistical work 
must be interpreted by human beings, each of whom can interpret only 
in terms of his own experience and his attitude toward the problem at 
hand. An excellent example of unconscious bias appears in the writ- 
ings of certain statisticians and economists who during 1928 and 1929 
interpreted the trends of the times to mean that permanent prosperity 
at the then existing levels had been attained. Subsequent events have 
shown that these men were so enamored of the favorable factors that 



48 BUSINESS STATISTICS 

they overlooked the growing stresses in our economic system. Their 
biased attitude is apparent now, but at the time their teachings had 
a wide acceptance. 

Skepticism 

The beginner in statistical work is likely to have the attitude that 
numerical facts can be accepted without question. A few adverse 
experiences will usually dispel this initial trustfulness. The attitude 
of faith should then be replaced by skepticism or in the extreme by 
cynicism, because it is far better to err in that direction than to develop 
enthusiasm with its attendant misinterpretation. Many of the fallacies 
which appear in statistical presentation arise from failure of those 
responsible for the results to maintain a critical attitude toward their 
work. 

STEPS IN STATISTICAL INVESTIGATION 

As stated at the beginning of the chapter, there is a logical sequence 
of steps to be followed in statistical investigation. An outline of these 
steps will give the reader a view of the process as a whole prior to 
studying the details. 

I. Statement of the problem 
II. Preliminary planning of the investigation 

III. Collection of data 

A. Library sources 

B. Direct sources 

IV. Analysis of data 

A. Editing of collected information 

B. Tabulation 
C Ratios 

D. Graphs 

E. Measures of central tendency 

F. Measures of dispersion and skewness 

G. Index numbers 

H. Time series analysis and application 
I. Correlation and variance 
J. Tests of reliability of samples 

V. Interpretation and application of the results of analysis 
VI. Preparation of a report of the completed investigation 



STATISTICAL INVESTIGATION 49 

The remainder of the book is devoted to a detailed presentation of 
the work involved in following through the several steps. Although 
the emphasis given on subsequent pages to the different parts of this 
outline depends upon the difficulty and ramifications of the particular 
subject, it is to be hoped that the reader will not lose sight of the fact 
that with one necessary exception he is following the outline step by 
step. The interpretation of the results of each type of analysis is not a 
procedure that can be relegated to a separate section of the book. 
Therefore the discussion of illustrative examples has been woven into 
the text wherever it has seemed desirable. 

THE SCOPE OF DIFFERENT INVESTIGATIONS 

The six major steps presented here cover the complete sequence of 
things that must be done in conducting an investigation, no matter 
how limited or how broad its scope. The subheadings are partly 
alternative and partly sequential depending on the character of a par- 
ticular problem. The amount of detailed planning required and the 
time consumed in executing the plan will, of course, vary with the size 
and importance of the investigation. The type of planning in turn 
is directly related to the question of whether the investigation is in- 
ternal or external in character. An internal investigation is one which 
deals exclusively with conditions within a single business concern or 
agency. Those investigations that originate outside the management 
of any particular business concern are called external. There is one 
great difference between internal and external investigations: the 
former as a rule present no serious problems of collecting data, 
whereas the latter are seldom free from such problems. 

Internal Investigations 

Statistical studies by a business concern of its own records are usually 
conducted to obtain information needed to assist management. The 
most common examples are found in the work of the cost accounting 
department. The data for determining unit costs are found in the 
accounting department and in the plant production records. The task 
of allocating the various factory and overhead costs to obtain an aver- 
age cost per unit of product requires the use of statistical techniques. 
The cost accountant ordinarily does not employ all of the steps of 
investigation as outlined. His task is a rather circumscribed one, but it 



50 BUSINESS STATISTICS 

is necessary for him to be familiar with the complete process in order 
to make intelligent use of the part that he needs. 

Internal investigations of a more general character are a necessary 
part of business control, and these are likely to make use of more of the 
steps of statistical investigation. For example, an oil company wishing 
to study the weekly rhythm in the sales of gasoline at its filling stations 
in different parts of a city would have no difficulty in collecting data, 
since that information would be included in the daily report of the 
manager of each station. Combining the sales reports for a number of 
weeks to obtain an average relationship would involve certain adjust- 
ments for weather conditions, for any irregularities at the station which 
might affect sales, for holidays and other circumstances of a similar 
nature. Once the pattern of the weekly rhythm at each station had been 
obtained, the next step would be to study these patterns in pairs and 
groups to discover in what parts of the city similar rhythms appeared. 
This information could be used by the central office in assigning attend- 
ants so as to provide the maximum service to customers, in planning 
delivery schedules of tank trucks, as well as in planning the location 
of new stations. 

In this example the emphasis is on analysis and interpretation, and 
that will be found commonly true of internal investigations. Although 
relatively simple statistical techniques are involved in this case, trust- 
worthy conclusions depend upon following the steps of investigation 
faithfully. This leads to the general observation that a knowledge of 
the steps of statistical investigation and the relation of each to the 
whole is necessary to protect the statistician from error, even though 
his particular problem involves the use of only a part of the whole 
procedure. 

External Investigations 

Investigations conducted by manufacturers' and trade associations, 
advertising agencies, research bodies, universities, and government 
agencies are ordinarily more general in character than internal investi- 
gations. Correspondingly they call for the use of a wider range of 
statistical techniques. In particular, the preliminary planning and the 
collection of data demand much more attention in external investiga- 
tions. For example, the plans for the 1940 census of population were 
under way as early as 1937 in the Census Bureau at Washington and 
in the field with co-operating agencies. The field work required only 



STATISTICAL INVESTIGATION 51 

a few weeks early in 1940, but a large staff will be continuously 
engaged for the next decade in preparing and publishing the various 
tabulations and analyses of the collected information. The entire 
process, in reality a continuous statistical investigation of the popula- 
tion, falls within the framework of method outlined in the preceding 
section. 

Sometimes the scope of an investigation is limited in the sense that 
not all of the successive steps are carried on by a single agency. The 
task may be confined to collecting data. If so, the steps following 
collection can be ignored, but those preceding actual collection must be 
given proper attention. Again the particular task may be confined to 
interpretation and presentation of data collected by someone else, but 
the preceding steps must be thoroughly understood before any attempt 
is made to explain the meaning of the results. 

SUMMARY 

All of this discussion points to the same conclusion, namely, no 
matter how simple a particular piece of statistical work may be, its 
execution requires a knowledge of the steps in statistical investigation. 
Through this principle the various details of statistical method and 
technique are welded into a unified whole. Succeeding chapters are 
arranged so that the steps of statistical investigation will appear in 
natural sequence. Within this sequence the methods of analysis 
progress from those using simple techniques to those which are more 
involved. 

PROBLEMS 

1. What are the differences between research in the natural sciences and 
statistical research? 

2. Describe an example from your own experience of the use of mass data in 
statistical work. 

3. State a definite subject for investigation in each of the following fields: 
(a) automobiles, (b) cost of living, (c) athletics, (d) profit. 

4. What changes would you suggest in the conclusions reached in each of 
the following examples ? 

a) Hourly wage rates in industry have increased uninterruptedly for the 
past 20 years, and the cost of living is lower today than it was 20 
years ago. Therefore the living standard is higher today than it has 
ever been in the past. 



52 BUSINESS STATISTICS 

b) In the District of Columbia in a recent year one male automobile driver 
in every 1,370 was involved in a fatal accident and one female driver in 
every 9,090 was involved in a fatal accident. Therefore women are 
safer drivers than men. 

During World War I the United States army lost 126,000 men killed 
in action, died of wounds, and died from disease or other cause out of 
4,355,000 men mobilized, a death rate of 28.9 per 1,000. During the 
years 1917 and 1918 the death rate for the United States exclusive of 
the armed forces was 32.4 per 1,000. Therefore it was safer in the 
army than at home. 

5. Explain the differences between internal and external investigations. 

6. Which of the steps of statistical investigation were employed in preparing 
the reports appearing as Examples 1, 2, and 3 in chapter I, pages 7-9? 
Give references in the examples of specific statements which indicate the 
use of the several steps named in your answers. 

REFERENCES 

CROXTON, FREDERICK E., and COWDEN, DUDLEY J., Applied General Statistics. 
New York: Prentice-Hall, Inc., 1939. 

Chapter I emphasizes the parts of statistical investigation and the kinds of 
errors that appear in statistical work. 

MILLS, FREDERICK G, "On Measurement in Economics," The Trend of Eco- 
nomics. New York: Alfred A. Knopf, 1924, pp. 37-72. 

An advanced statement of the place of statistical investigation in the 
realm ">f science. 

SECRIST, HORACE, "Statistical Standards in Business Research," Quarterly Pub- 
lications of the American Statistical Association, Vol. XVII, No. 129 (March, 
1920), pp. 45-58. 

An article that helped to establish standards in a period when business 
research was less firmly established than at present. 

SPAHR, WALTER E., and SWENSON, RINEHART J., Methods and Status of Scien- 
tific Research. New York: Harper and Bros., 1930. 

Every embryonic statistical investigator should be familiar with the point 
of view expressed in chapters I-VI. 



CHAPTER IV 
PRELIMINARY PLANNING OF INVESTIGATIONS 

INTRODUCTION 

INEXPERIENCED collectors of data sometimes make the mistake 
of jumping directly into the task of collection without an adequate 
comprehension of the problem with which they are dealing. This 
practice should never be followed no matter how simple and direct 
the problem may appear to be. There are always preliminary points 
which should receive attention prior to the actual collection of data. 
The four major steps which should be followed are: (l) define the 
problem; (2) study the problem; (3) plan the procedure; (4) pre- 
pare a statement of the program. 

DEFINE THE PROBLEM 

At the outset a crude statement will serve as a focus for the initial 
consideration of what is involved in the problem, but the crystallization 
of a few ideas will very quickly provide a mental setting and point to 
some of the limitations which should be established as a basis for more 
careful planning of the investigation. These preliminary ideas should 
be brought together in a more complete definition which will indicate 
the subject to be investigated, the exact object of the investigation and 
the limitations upon its scope. 

An example will demonstrate the difference between an incomplete 
and a complete statement of the subject for research. Suppose that 
the statistician were to receive the following problem: "The sales of 
our company declined last month. This decline was unexpected, since 
all parts of the organization appeared to be unusually busy. Investi- 
gate the matter." This statement does not define the problem for 
research. It could be taken to mean that an investigation was wanted 
of why the organization appeared to be unusually busy; but assuming 
that an investigation is wanted of why sales declined, the statistician 
needs more information before proceeding with the work. Questions 
such as the following must be settled: 

33 



54 BUSINESS STATISTICS 

Are all products to be included or only major ones? If the latter, 
which products? 

Is the investigation to be confined to discovering the facts or shall 
it include data pertinent to discovering the cause of the decline? 

How much time is available for making the investigation? 

Have the affected departments agreed to co-operate? 

With questions of this sort settled, the problem might be restated 
somewhat as follows: "Investigate the extent of the decline last month 
in the sales of the five major products which we manufacture and pre- 
sent as much collateral information as possible to aid in determining 
the cause of the decline. Your report should be available prior to the 
directors' meeting which will be held three weeks hence." The purpose 
of the investigation is clear and the limitations as to time and scope 
are definite. 

This example illustrates the type of definition required in an internal 
investigation to be carried out within a short period. Larger problems 
will require correspondingly greater amount of definition. 

STUDY THE PROBLEM 

Read About the Problem 

' A knowledge of previous work that may have been done on a prob- 
lem should be acquired as background before a new investigation is 
undertaken. The existence of earlier studies can be determined pri- 
marily through a search of library files. One may find that the 
problem has been investigated previously and that any further investi- 
gation should be built upon the existing work. Again flaws may be 
found in the previous work which make it completely or partially 
useless for the purposes in view. The chief value of studying such 
previous investigations may lie in discovering what not to do. 

Library search may disclose the fact that no similar statistical inves- 
tigation has been made previously. But books and magazine articles 
may be discovered which give factual information dealing with some 
phase of the subject or clues concerning methods of investigation. 
Library reading on the subject will aid the investigator in avoiding 
duplication of work already done; in avoiding the errors made in 
previous investigations; in discovering methods of approach and pro- 
cedure; and in acquiring a broad perspective of his problem. 

Finally there will be some cases in which no usable information of 



PRELIMINARY PLANNING OF INVESTIGATIONS 35 

any sort can be gleaned from library search. When this happens the 
investigator must be prepared to proceed without such assistance. He 
must be able to supply from his own experience the background that 
otherwise would have come from library reading. 

Think the Problem Through 

'At this stage the investigator should take some time for thoughtful 
consideration of his problem. There arc major parts of his plan which 
should be settled. Certain parts may need additional emphasis and 
others should perhaps be discarded. New phases may enter as a 
result of the reading done. The knowledge which has been acquired 
by reading needs to be related to the particular problem at hand. The 
investigator should be able to visualize his entire procedure. At first 
this should be confined to the main outline of the work, and following 
that the details should be considered. In constructing this mental 
image of the work it is unwise to assume that the preliminary planning 
can ignore details that are apparently simple, for they may contain 
difficulties. The success of the true investigator lies in his ability to 
foresee these concealed difficulties and make provision for them. 1 The 
case of the student who wrote to a number of cement companies asking 
for production in tons and price per barrel of cement will illustrate 
the point. The companies had to change their figures, which were 
recorded in barrels, to tons to meet the student's questions and the 
student in turn had to change from tons back to barrels when he 
tabulated the data. A small amount of foresight would have avoided 
the difficulty. 

A word of caution may aid in avoiding misinterpretation of the pre- 
ceding paragraph. While the plan of procedure should be thought 
out very carefully, it is scarcely to be expected that no subsequent 
changes will be necessary. Regardless of how efficient the investigator 
may be, it is unlikely that he can foresee and provide for every con- 
tingency which may arise. iThe plan should therefore be sufficiently 
flexible to permit necessary adjustments to conditions as they develop. 

PLAN THE PROCEDURE 

The amount of planning needed will be determined by the com- 
plexity of the problem and the size of the investigation. In some cases 
the points discussed in this section will take care of themselves, but 



36 BUSINESS STATISTICS 

more commonly decisions concerning them must be made prior to 
beginning the collection of data. Under either circumstance considera- 
tion must be given to the elements of the plan lest some essential be 
overlooked. 

Library Sources and Direct Sources 

The reading which has been done on a problem should indicate 
fairly well whether the needed data will be available in libraries or 
whether recourse must be had to direct spurces. It will not serve merely 
to remember that some data on the subject were referred to in a book 
or magazine article. The data must be found and examined. Then 
several questions must be settled. Are the data in usable form? Do 
they include the desired time period? Do they cover the proper area, 
i.e., nation, state, locality, etc? Are they expressed in the correct unit 
for the particular purpose? Are they reliable? By the time these 
questions have been investigated the general problem of whether or 
not library sources can be used will have been definitely settled. If 
library sources can be used, the investigator is ready to move on to the 
next part of his plan. 1 If on the contrary library sources should fail 
to provide any or all of the required data, the possibility of securing 
them directly must be canvassed. 

Using direct sources 2 means going to the business concerns, agencies, 
or individuals possessing the information to obtain at first hand data 
which do not exist anywhere in print. In some cases the preliminary 
survey may disclose the fact that the desired data cannot be found in 
library sources and that they are equally unavailable from direct sources. 

1 The details concerning the collection of data from library sources are presented in 
chapters IX and X. 

2 The classification of data as library and direct, a distinction according to source, 
represents something of a departure from the usual classification found in textbooks. The 
customary division into primary and secondary data places the emphasis on the number 
of times that data have been recorded, i.e., primary data are those which are being 
recorded for the first time by the investigator who assembles them, whereas any subse- 
quent recording of the data by other than the original investigator makes them secondary 
data. For example, the report of steel production found in the Annual Statistical Report 
of the American Iron and Steel Institute is primary data and the Report is a primary 
source, whereas the same figures published in the Survey of Current Business of the 
United States Department of Commerce become secondary data, and the Survey a secondary 
source. The names, primary and secondary, suggest that the former are more reliable 
than the latter. The point of view of this book is that reliability does not depend so 
much upon the number of times the data have been handled, as upon factors related 
to the canons of statistical investigation discussed in the preceding chapter factors which 
are as lively to affect one kind of source as another. 

The distinction between library and direct sources places the emphasis on methods 
of procedure. One type of research is required to obtain data already available for gen- 
eral use, but quite a different type of work is required to obtain data directly from 
the originating source. 



PRELIMINARY PLANNING OF INVESTIGATIONS 57 

For example, recently the Minimum Wage Division of the New 
York State Department of Labor undertook to obtain data on hours 
worked daily and weekly by operators in beauty parlors. The nature 
of the work in this trade is such that hours actually worked vary 
greatly from stated schedules, but are unrecorded except in the very 
large establishments. As a result, the field workers found it very 
difficult to secure accurate information. When obstacles of this sort 
are discovered, the choice lies between abandoning at least those diffi- 
cult features of the investigation, or continuing with the understanding 
that the results will have only conditional validity. On the other hand 
if the preliminary survey indicates that the data will be available from 
direct sources, the investigator is ready to enter into the detailed plan- 
ning of the work of collecting them. 

Sometimes a combination of library sources and direct sources can 
be used. For example, in comparing wage rates in a particular com- 
munity with rates for similar employment in the entire state and the 
entire country, it might be feasible to obtain the data for the state 
and the nation from the reports of the United States Bureau of Labor 
Statistics, whereas the local data would have to be secured directly 
from business concerns in the community. In all such cases it is desir- 
able to make as much use as possible of library sources. 

If the data required for an investigation can be obtained from library 
sources, the procedures discussed in the nex!: sections will not be needed. 
On the other hand if direct sources must be used, the investigator 
must be thoroughly familiar with the principles of sampling and with 
the practical technique of the collection process by the method of 
either sample or census. 

Census and Sample 

In some investigations it is desirable or even necessary to make a 
complete enumeration. This is known as the census method. The cen- 
sus method is used in part of the statistical work of the federal govern- 
ment. The decennial population census is a complete enumeration, as 
are the Census of Manufactures, the Census of Business, the Census 
of Agriculture, and others. Other complete collections of data are 
by-products of the tax-collecting function of the government. Examples 
of these are the statistics of imports, corporate and individual incomes, 
cigarette consumption, and gasoline consumption. 

In contrast to these cases of complete collection of data are the 



58 BUSINESS STATISTICS 

great majority of external investigations in which the census method 
is impossible. Instead of collecting all of the information concerning 
a given subject, these investigations depend upon obtaining a sample 
which will be representative of the whole. The methods of securing 
a representative sample will be discussed in detail in chapter V. We 
are interested here merely in pointing out that results representing a 
large population of items can be obtained by the use of a sample. In 
constructing a wholesale price index no attempt is made to include 
the price at which every wholesale transaction is made. The prices 
of only a few articles in important markets are used. The American 
Experience Mortality Table, giving the age at death of an initial 100,- 
000 persons at age 10, was constructed from a large sample of insured 
lives. Crop reports of the Department of Agriculture are based upon 
information received from local reporters in all parts of the country 
who have in the aggregate a knowledge of the condition of no more 
than 3 to 5 per cent of the acreage planted. 

Internal investigations should use complete enumeration when fea- 
sible because of the greater accuracy, but situations may arise in which 
the size of the investigation or the difficulty of securing data even 
within a single concern preclude the use of all of the data. A good 
case of partial enumeration occurred when a large mail-order house 
desired to have a check at two weeks' intervals on its gross profit or 
difference between total income from sales and cost of goods. The 
income from sales could be obtained from the accounting department, 
but to get the cost of goods sold in any two weeks' period would have 
been impossible from the point of view of both time and expense. 
The concern therefore took 100 orders at random 3 and computed the 
cost of the goods included in those orders. The results were applied 
to the 19,000 orders which the concern filled during a two weeks' 
period. 4 The method used did not lead to an exact answer to the 
problem, but the results were good enough to allow a current check 
on selling prices. In any event the time involved in using complete 
data would have forced the company to abandon the idea. 

8 A random sample results from taking individuals from a group by some system 
that is in no way dependent upon the characteristics of the items chosen, so that the 
presence in the sample of any particular item is left entirely to chance. In this case 
the equivalent of a random sample could be obtained by taking every 190th order that 
appeared on the sales record. A similar result could be achieved by taking the first 10 
orders recorded each working day of the two weeks' period. Any methods similar to 
these would serve the purpose of providing a sample of 100 orders which would be 
representative of the 19,000 filled during the period. 

4 Example taken from page 76 of M. A. Brumbaugh and R. Riegel, Study Problems 
in Business Statistics. New York: American Book Co., 1935. 



PRELIMINARY PLANNING OF INVESTIGATIONS 59 

A decision must always be made on the question of census or 
sample. Sometimes attendant circumstances will make the decision 
almost automatic; in other instances they may complicate it. A case 
in point is the unemployment survey made at the close of 1937 by 
the federal government. The necessity of having quick results made 
it desirable to rely on a sample. On the other hand previous experience 
indicated that only a complete enumeration would be reliable. A com- 
promise plan was used in which the reporting form was distributed by 
mail carriers to every family, and various channels of publicity were 
used to urge all unemployed persons to fill out the form and return 
it to the local post office. There was considerable doubt whether the 
reporting would be complete, so a check was made by house-to-house 
canvass in selected cities, villages, and rural areas. The check showed 
that the voluntary reporting was about 72 per cent complete, but that 
there was considerable variation in the completeness of registration 
from one locality to another. The sacrifice of correct method to secure 
a quick report lessened confidence in the result. 

The Collection Method. Agejits and Mail Questionnaires 

If it has been determined that a complete census must be taken, 
there is practically no choice as to method. Only by the use of agents 
for personal interviews and follow-up visits can 100 per cent collection 
be guaranteed. Even with the aid of compulsion by the federal gov- 
ernment, when data for the Census of Manufactures are collected by 
mail, it is necessary to send agents to secure delayed reports. 

When sampling is deemed satisfactory as a method of investigation, 
there are alternative methods of approach. In a study of limited scope, 
one investigator may make the plan and collect all the data personally, 
but as a rule some other method of collection must be employed. 
Agents may be sent out to secure replies to a list or schedule of ques- 
tions, or the personal element may be abandoned entirely in favor of 
the use of questionnaires sent through the mail. The two methods 
are sometimes combined when mail questionnaires are sent out to all 
from whom information is desired and after a reasonable period of 
time has elapsed agents are sent to those who have not replied. A 
variation of this method is employed when agents collect data in thickly 
populated centers and mail questionnaires are sent to respondents in 
less accessible regions. The detailed methods of collecting information 
by using agents or the mail will be presented in chapter VI, but the 



60 BUSINESS STATISTICS 

decision as to which method to employ is an essential part of the 
preliminary plan and rests upon a number of considerations. 

Importance of Personal Element. The function of the agent is to 
create a favorable attitude toward his mission, to explain doubtful 
points concerning the investigation, to encourage the informant to 
provide the desired information, and to record responses. These things 
cannot be done with a mail questionnaire. It loses immediately what- 
ever value inheres in the personal contact between an agent and an 
informant. The form and tone of the mail questionnaire should be 
designed to supply as far as possible the missing personal element, but 
the fact remains that a mail questionnaire is an impersonal appeal for 
information and the investigator must expect it to be treated as such. 

Area Covered. Investigation within a single city or local area can 
usually be done more thoroughly and in less time by agents. The 
agents can also be directly supervised and their completed schedules 
checked as they are turned in. All of these things add to the accuracy 
of the work. When a larger area is to be covered, direct supervision 
is impossible, a larger number of agents cannot be as well trained, 
and the value of direct contact with the informant is greatly diminished 
because of poorer agent technique. In general when a large area is to 
be covered mail questionnaires should be used, whereas agents are 
superior for investigations confined to small areas. There are, of course, 
exceptions to this rule as the subsequent discussion will show. 

Time Element. It is very difficult to confine an investigation using 
mail questionnaires to a fixed time period. The questions may require 
only five minutes to answer, and the need for immediate answers may 
be quite clear to the respondents, but actual experience shows that 
the replies will straggle back over a period of time. The usual pro- 
cedure is to close the collection arbitrarily after a reasonable period 
has elapsed and enough replies have been received to permit analysis. 
The more involved the questionnaire, the greater the uncertainty as to 
when the replies will be received. In planning the successive steps of 
an investigation using mail questionnaires, it is never advisable to allot 
a certain period such as two weeks or one month for the replies to 
be returned. Unless some flexibility is introduced into the time plan, 
the subsequent steps of the work are likely to be disorganized by 
unexpected delay in receiving filled-in questionnaires. 

An example from the writer's experience will illustrate what can 
happen. An association of worsted yarn manufacturers requested an 



PRELIMINARY PLANNING OF INVESTIGATIONS 61 

investigation of their equipment and trends in the production of differ- 
ent kinds of yarn. A questionnaire was prepared and submitted to the 
supervising committee of the association for approval. After making 
several suggested changes the form was printed and distributed to the 
96 members of the association during a meeting at which practically 
all members were present and agreed to co-operate by returning the 
information within 30 days. At the end of 30 days about 20 completed 
questionnaires had been received. Another month elapsed during 
which an additional 10 or 12 replies were received. A "follow-up" 
letter was then sent to all delinquents. This brought another 20 replies 
within a month. During the next six months cajolery, personal visits, 
and personal favors brought the total of completed replies to 70. The 
work of analysis was completed just about one year after the investiga- 
tion was initiated. 

In contrast to the uncertainty encountered in this example, the use 
of agents permits the establishing of a definite time schedule. The 
agents can be allotted fixed amounts of work and their operations 
can be carefully supervised. If two months, for example, have been 
allowed in the plan for agents to collect the data, it can be definitely 
expected that at the end of the two months all reports will be turned 
in. Nothing as certain as this can be anticipated when mail ques- 
tionnaires are used. 

Percentage of Replies. Where agents are used an investigator can 
lay his plans to get a certain number of cases and enough agents can 
be put in the field to secure the desired number within a specified 
time. No equivalent certainty concerning the number of cases can 
be introduced when mail questionnaires are used. Usually a large part 
of those to whom questionnaires are sent will disregard them; hence 
a return of 10 to 20 per cent on an ordinary investigation is the likely 
response. However, there are in particular cases circumstances which 
may result in a much higher or much lower percentage of return. 
Actual experience with questionnaire technique leads to certain gen- 
eral explanations of the small proportion of replies. These can be 
listed as follows: 

a) Some individuals and certain classes of the population have an 
aversion to giving any information under any circumstances. Others 
intend to reply but fail, due simply to inertia. 

) The questionnaire method has been over-used to such an extent 
that busy men throw all questionnaires into the wastebasket. 



62 BUSINESS STATISTICS 

c) The sponsorship of a well-known individual or agency may in- 
crease the percentage of replies, and the absence of any such identifi- 
cation may affect the percentage of replies adversely. 

d) As a rule, the shorter the list of questions, the higher will be 
the percentage of replies. 

e) Simple questions with "yes" or "no" answers will bring a better 
response than complicated questions. 

/) When the respondents have a direct interest in the subject mat- 
ter of the questionnaire, or when they will receive some personal or 
group benefit such as a premium, a free sample, or a copy of the 
results of the study, the percentage of replies will be above the 
average. 

These are some of the factors that lead to the low percentage of 
replies received from a mail questionnaire and they must be taken 
into account in estimating how many questionnaires should be sent out 
in order to get a desired number of replies. 

Cost. The question of cost is closely related to area covered. In a 
local investigation the agents can be assembled for training at nominal 
expense and their transportation costs while in the field are small. In 
a larger investigation centralized training means transportation for the 
agents to the training point and back to the field, while decentralized 
training involves transportation for the training staff or the establish- 
ing of a number of training staffs. All of this is not only expensive 
but extremely cumbersome. 

In an investigation covering a large area, mail questionnaires are 
usually less expensive than schedules collected by agents. Suppose 
that 8,000 letters were sent out in an investigation and 1,500 replies 
were received. The cost except for preparation of the questionnaire 
would be about as follows: 

8,000 envelopes at $4.50 per thousand ...$ 36.00 

8,000 business reply envelopes at $6.50 per thousand . . . 52.00 

8,000 addressing, folding and insertion at $2.25 per hundred . . . 180.00 

8,000 stamps at $ .03 each 240.00 

1,500 stamps (business reply rate) at $ .04 each 60.00 

Total cost $568.00 

Cost per reply (568 -i- 1,500) = $ .38. 

The estimated cost turns out to be 38 cents per questionnaire which 
is probably less than the cost of using agents. 

On the other hand it does not always follow that agents are cheaper 
for a local investigation. If this same study were made within a single 



PRELIMINARY PLANNING OF INVESTIGATIONS 63 

city, the expense of postage would be reduced making the cost about 
32 cents per questionnaire. If a corresponding estimate of the cost 
of using agents turned out to be more than 32 cents per schedule, then 
mail questionnaires would be cheaper even though the investigation 
were local in scope. 

Amount and Complexity of Information. If the number of ques- 
tions is small, answers can be obtained by mail. A long list of ques- 
tions practically precludes the use of the mail questionnaire because 
too few replies are likely to be received. To get replies to a long list 
of questions requires the persuasion of personal contact between agent 
and informant. Also the information which can be obtained by mail 
must be relatively simple. Questions which require lengthy explanations 
or interpretations or information which is difficult for the respondent 
to give, particularly if long statements are necessary to answer ques- 
tions, all tend to reduce the number of replies by mail. When an 
investigation involves asking questions of this sort agents should be 
used. Replies by mail will not be satisfactory either in number or 
in accuracy. 

A practical example will illustrate the circumstances under which 
one method is more suitable than the other. A research bureau collects 
retail prices of food articles monthly from 25 grocery stores and 
monthly sales from 50 drugstores. All of these stores are located in 
one city, yet the bureau uses agents to collect food prices but mail 
questionnaires to collect drugstore sales. The difference lies in the 
fact that any clerk can give the food prices or the agent himself can 
take them from the price tags. Also the food prices are available any 
time the agent appears at the store, whereas the sales figures for drug- 
stores are not made up until the manager or owner has time to 
prepare them. An agent might have to make several visits for the 
data. Further, the grocer would not bother to write down the 
prices of the 42 articles which appear on the schedule, but the 
druggist does not object to transferring a single sales figure from his 
ledger to the bureau's collection sheet. These examples illustrate 
the kinds of facts which can be obtained by agents and by mail 
questionnaires. 

Type of Information. Quite apart from the question of complexity 
there are certain types of information which can be obtained better by 
mail, others are more suitable for collection by agents. Mail questions 
must not offend. The same, of course, is true of the questions on a 



64 BUSINESS STATISTICS 

schedule in the hands of an agent. However, the agent can get per- 
sonal information which cannot be obtained by mail. Skillful interview- 
ing may procure confidential information on subjects which would be 
offensive in the absence of the personal element. In 1936 the United 
States Public Health Service collected data by the use of agents from 
thousands of families all over the country on subjects entirely beyond 
the reach of mail questionnaires. Here are some examples from that 
schedule: 

1. What disabling illness occurred in the family during the past 
year? 

2. Is there other handicapping disease or condition? 

3. Has anyone in this home ever been examined for tuberculosis? 

4. Has anyone in this home been to a health clinic or health center 
during the past year? 

5. Is any member of the family crippled, deformed, or paralyzed? 

6. What is the annual family income? 

This schedule had 64 questions, most of them on a par with those 
given, which in each instance asked for details as to conditions, treat- 
ment, and physician in attendance. Information of this sort could 
not have been obtained by mail. 

Bias. When questionnaires are mailed to a list of business con- 
cerns or persons, that list has been selected as a representative group 
from which to obtain the desired information. At that point, however, 
the investigator's control over the group ceases. Some will reply, others 
will not. Are those who reply representative of the entire group? 
Experience shows that when a request is made to business men many 
who are not able to make a favorable report on the information re- 
quested will not reply at all. This tendency introduces a definite bias 
into the results and greatly reduces the value of the questionnaire 
method. An equally disconcerting bias enters when questionnaires are 
sent to individuals. Those with more education or more experience 
are likely to reply, whereas whole segments of the population which 
one may wish to reach will disregard the questionnaire entirely. In 
general then, a bias is likely to appear in the replies to a questionnaire 
because the ones who reply are not representative of those to whom 
the questions were sent. Notice that this is quite apart from any 
tendency of respondents to give biased answers, a difficulty which the 
investigator faces whether the data are collected by agents or by the 
use of questionnaires. 



PRELIMINARY PLANNING OF INVESTIGATIONS 65 

Summary. In deciding whether to use agents or mail questionnaires 
in a particular case, all of these factors must be taken into account. 
Sometimes one will be determining, again the balancing of all of them 
will point to the preferable method. 

Occasionally there is an advantage in using a combination of the 
two methods. A schedule of questions can be sent to the informant 
by mail with a request that it be given preliminary consideration pend- 
ing the arrival of an agent at a later date. This method is effective in 
investigations requiring complex information or where it is necessary 
to assemble the information from various offices of a business concern. 
The work can be done prior to the arrival of the agent, but the agent 
can go over the schedule to be sure the questions have been interpreted 
correctly. A modification of this method is used by the Department 
of Commerce in taking the Census of Business. 

PREPARE A STATEMENT OF THE PROGRAM 

In the course of attending to the various details arising in the 
preceding steps there is a chance that some essentials may have been 
overlooked, or that points originally included in the plan will sub- 
sequently be forgotten. To prevent such contingencies the entire pro- 
gram should be put in writing. There are several advantages to the 
investigator in doing this. It forces him to regain proper perspective 
with respect to the investigation. It permits him to pick up any loose 
end in his plan. It gives him a complete statement to which reference 
can be made in the future, if puzzling situations arise. It provides a 
preliminary outline for writing the final report. 

The statement of the program should be submitted to the sponsor 5 
of the investigation for approval. This step is particularly necessary 
when the project involves direct collection of data but is applicable 
to some extent even though the data are to be taken from library 
sources. All too frequently misunderstandings between investigator 
and sponsor arise subsequently because of failure to come to an agree- 
ment at the beginning as to exactly what will be done and how it will 
be done. The investigator should present his program in writing and 
in return insist upon a written approval from his sponsor. 

8 "Sponsor" means the organization or individual authorizing the investigation. Hence 
the sponsor may be a board of directors, a board of trustees, a higher executive, an 
advertising agency, etc. 



66 BUSINESS STATISTICS 

PROBLEMS 

1. Each of the following is a statement of a problem for investigation. 
Rewrite any of them that fail to define the problem completely. 

a) Retail sales taxes have no adverse effect on the sales of cigarettes. 

b) Between 1938 and 1941 the movements of prices on the New York 
Stock Excliange can be explained largely by charting with them the 
series of crises and tensions in European affairs. 

f) We (management) know that the change in the time of introducing 
new models of the Kistler automobile fiom January to November 
has changed the sales curve, but we are in doubt whether the expected 
decrease in the peak and trough of sales has occurred. Prepare a report 
on this question for the meeting of sales representatives on Sep- 
tember 23. 

2. A student was given the following assignment in a statistics class in 1935. 
"Has the center of the slaughtering and meat packing industry been moving 
westward during the past 40 years?" The student read The Jungle by 
Sinclair, a story of conditions in the industry in Chicago. He then pro- 
ceeded to collect data on the number of head of cattle, sheep, and hogs 
shipped from each state of the United States, and the number of animals 
slaughtered at various important cities, such as Omaha, Kansas City, Chi- 
cago, and Buffalo. He also collected data on the livestock receipts at 
principal markets. The student then sought help in completing the work. 

a) What criticism would you make of his work to date? 

b) How would you advise him to proceed? 

3. Which of the following are library and which are direct sources? 

a) The price of wheat is obtained from a daily paper. 

b) The sales of retail drugstores in a community are reported by the indi- 
vidual stores monthly to a research bureau which issues a monthly 
report of combined sales to the reporting stores and to newspapers. 

c) An advertising agency calls residences by telephone to inquire whether 
the radio in the home is in use. 

d) The federal income tax law requires that a copy of all tax returns be 
kept on file for public inspection in local offices of the Bureau of 
Internal Revenue. A student prepares a study of income distribution 
in his city based on these duplicate tax reports. 

4. Investigate each of the following in the reference given to determine 
whether the method of collection is by sample or census. 

a) Each year, beginning in 1935, the Department of Agriculture publishes 
complete information on agricultural production. Agricultural Statistics, 
United States Department of Agriculture, pp. 1-5 (approximately). 

b) The net profits of corporations as compiled by the Federal Reserve 



PRELIMINARY PLANNING OF INVESTIGATIONS 67 

Bank of New York. Survey of Current Business, 1938 Supplement, 
United States Department of Commerce, pp. 64 and 180. 

c) The value of production of manufactures in the United States. Biennial 
Census of Manufactures, any issue, United States Department of Com- 
merce. (The description of method is found at different places in 
different issues. In the 1925 Census, for example, the description of 
method is found on pp. 3-6.) 

d) The loans and investments of reporting member banks of the Federal 
Reserve System in 101 cities. Survey of Current Business, 1938 Sup- 
plement, United States Department of Commerce, pp. 55 and 178. 

5. State in each of the following examples of collection whether agents or 
mail questionnaires should be used and whether the census or sample 
method should be used. Give reasons for answers in each case. 

a) A city welfare organization wished to make an investigation of the 
extent to which families receiving city relief were paying money on 
installment purchases. 

b) A city restaurant association wished to study the distribution of ex- 
penses of doing business of its 53 members. 

c) An advertising agency wished to inquire from the owners of a certain 
make of automobile whether they would purchase the same make of 
car again. 

d) A corporation wanted information concerning how many of its 4,500 
employees were home owners, the value of their homes, and where the 
homes were located. 

6. In each of the following examples the student is expected to lay out 
a preliminary plan for the collection of data, giving explanations of pro- 
cedure and reasons for choice where alternate methods are available. 

a) A study of vacant dwellings in the community in which your college 
or university is located. The purpose of the study will presumably be 
to determine: (1) the percentage of dwellings vacant, (2) what types 
of dwellings have the highest and lowest vacancy ratios, (3) the sec- 
tions of the community having the highest and lowest vacancy ratios, 
(4) the relation of vacancy to age of dwellings, (5) allied questions 
that you may care to include. 

b) An automobile manufacturer advertising in newspapers and maga- 
zines, on billboards, and by radio wishes to discover which type of 
advertising is most effective in drawing the attention of the public to 
his product. 

c) A manufacturer of a well-known brand of toilet soap wishes to dis- 
cover by a direct appeal to consumers why the sales of his product have 
declined during the past year. 

d) A state milk control board wishes to find the variations in the price 
at which whole milk is sold in retail stores in the state. 



68 BUSINESS STATISTICS 

REFERENCES 

BOWLEY, ARTHUR L., Elements of Statistics. London: P. S. King and Son, 
Ltd., 1920 (fourth edition). 

Pages 14 and 15 contain a brief but effective statement of the preliminary 
planning of statistical investigations. 

BROWN, LYNDON O., Market Research and Analysis. New York: The Ronald 
Press Company, 1937. 

Chapter 8 gives a brief statement of the fundamentals of planning an 
investigation. 

CHAPIN, F. STUART, Field Work and Social Research. New York: The Cen- 
tury Company, 1920. 

Chapters I, II, and III are devoted entirely to the preliminary planning 
of investigations. 

EIGELBERNER, J., The Investigation of Business Problems. Chicago and New 
York: A. W. Shaw Co., 1926. 

Chapters I- VI provide general background for the principles of collecting 
data. 

SANDERS, ALTA G., and ANDERSON, CHESTER R., Business Reports. New York: 
McGraw-Hill Book Co., Inc., 1929. 

Chapters VI and VII give a detailed statement of the preliminary work 
involved in the collection process. 

SCHLUTER, WILLIAM C, How To Do Research Work. New York: Prentice- 
Hall Inc., 1929. 

Chapters I-X contain a painstaking description of the preliminary steps 
of investigation. 

SPAHR, WALTER E., and SWENSON, RINEHART J., Methods and Status of Scien- 
tific Research. New York: Harper and Bros., 1930. 

Chapter X contains a summary of methods of collecting data. 



CHAPTER V 
SAMPLING 

RELATION TO KNOWLEDGE 

MUCH of the world's knowledge is based upon inferences 
drawn from observation of samples. Finding the skeleton 
of a giant mammal embedded in rock strata demonstrated 
to have been on the surface of the earth 100,000 years ago, the 
paleontologist deduces the fact that such an animal lived in that period 
and then generalizes that this animal was typical of many alive at the 
time. One example has been found; therefore many others like it 
must have existed. A lumber jack taps along the side of a fallen tree 
with his axe and, listening to the sound, determines how far the tree is 
hollow and just where it becomes solid to the heart. His past experience 
in tapping logs represents a large sample providing the knowledge to 
be applied to the new log and his judgment is usually correct. A public 
speaker, wishing to drive home a point to his audience, illustrates with 
a story or an experience because he has found that this method of 
emphasis is the most effective. The response he has obtained in the 
past represents a sample the results of which are a part of his platform 
technique. These illustrations exemplify the extent to which sample 
experience becomes the guide to current action. Similar examples 
could be cited in every field of knowledge. The notion of sampling 
and the generalization of the results of sampling are in no sense 
peculiar to statistical work. Sampling is, however, particularly impor- 
tant in the field of statistics, because the numerical character of the 
subject lends itself to exact development. 

THE IMPORTANCE OF SAMPLING 

Sampling techniques are seldom necessary in internal statistical work 
but have their greatest application in external work. In the latter case 
it is seldom possible to obtain all of the data pertinent to a given 
statistical universe, 1 hence the usual situation requires that results be 

x The complete category of data from which a sample is drawn is known as a 
statistical universe or statistical population. In the preceding chapter an example was 
presented in which prices of groceries were collected monthly from 25 grocery stores in 
a city. The 25 stores are a sample of the universe or population consisting of all grocery 
stores in the city. 

69 



70 BUSINESS STATISTICS 

obtained from the study of samples. ;Thus we have: indexes of com- 
modity prices based on a few hundred of the thousands of commodities 
that are traded daily; average hourly wage rates in manufacturing 
plants determined from samples including no more than 5 per cent 
of factory workers; the market for a product estimated from the results 
of sending a questionnaire to 1 or 2 per cent of the potential users. 
These examples are sufficient to indicate the importance of sampling 
in statistical work. 

When a retailer buys shoes from a salesman he expects that they will 
be just like the sample which the salesman shows. In the same way 
we might expect to estimate the average age of 2,000 freshmen in a 
university from the ages of a sample of 100 of them attending a 
freshman lecture. Certain differences between these two "samples" 
will immediately occur to the reader. The shoes are all made on the 
same machinery, defects are weeded out by inspection, and uniformity 
is assured at every step of the manufacturing process. Therefore, 
one shoe picked at random does represent the entire lot. On the other 
hand we cannot be sure that the sample of 100 freshmen is repre- 
sentative as to age. The lecture may have attracted only more mature 
students, or a brilliant younger group who completed high school 
in three years. 

( The absence of control over statistical data is precisely what makes 
it necessary to develop principles and methods of sampling. We wish 
to know something about a certain universe of events or facts, but 
are unable to make a complete enumeration. Instead we must record 
specific facts concerning a sample drawn from the universe a sample 
which shall be representative of the universe. The problem is, How 
can such a sample be obtained? 



THE PRINCIPLE OF STATISTICAL REGULARITY 

If our knowledge of the universe is limited how can we ever know 
that a sample drawn from it is representative? The answer to this 
question comes from a principle which is as broad in its application 
as the laws of nature. It is known as the Principle of Statistical Regu- 
larity and may be stated thus: A sample selected at random from a 
universe will exhibit the characteristics of the universe, even though 
the number in the sample is small compared with the universe. The 
simplest illustrations of the operation of the principle occur in coin 



SAMPLING 



71 



tossing and dice rolling. Every throw is exactly like every other one 
and the experimental material, i.e., coins or dice, remains constant' 
The result of an experiment with coins is presented in Table 6. 
Ten coins were used and the results in the first four lines of the table 
are for groups of 50 throws each, or 500 coins. The 240 heads and 
260 tails obtained in the first trial of 500 varied 4 per cent from the 
expectation of 250 each. The second trial gave 245 heads and 255 
tails, a cumulative result of 485 heads and 515 tails in the first 1,000 
coins. The cumulative result varies 3 per cent from the expected 500 
of each. In successive rows of the table the results for the third and 
fourth trial of 50 throws, the third and fourth hundred throws, and 

TABLE 6 
THE PRINCIPLE OF STATISTICAL REGULARITY ILLUSTRATED BY COIN THROWING 



NUMBER OF 
THROWS OF 
TEN COINS 
EACH 


RESULT 


CUMULATIVE 


Actual Result 


Expected 
Result 
(Equal Number 
of Heads 
and Tails) 


Per Cent 
Variation 
from 
Expected 
Result 


Heads 


Tails 


Heads 


Tails 


1st 50 


240 
245 
253 
246 
501 
539 

2,024 
1,923 

1,999 
2,036 
2,009 
2,007 
2,015 
2,000 
1,993 
1,959 
2,013 


260 
255 
247 
254 
499 
461 

1,976 
2,075 
2,001 
1,964 
1,991 
1,993 
1,985 
2,000 
2,007 
2,041 
1,987 


240 
485 
738 
984 
1,485 
2,024 

2,024 
3,949 
5,948 
7,984 
9,993 
12,000 
14,015 
16,015 
18,008 
19,967 
21,980 


260 
515 
762 
1,016 
1,515 
1,976 

1,976 
4,051 
6,052 
8,016 
10,007 
12,000 
13,985 
15,985 
17,992 
20,033 
22,020 


250 
500 
750 
1,000 
1,500 
2,000 

2,000 
4,000 
6,000 
8,000 
10,000 
12,000 
14,000 
16,000 
18,000 
20,000 
22,000 


4.00 

3.00 
1.60 
1.60 
1.00 
1.20 

I 20 
1.28 
.87 
.20 
.07 
.00 
.11 
.09 
.04 
.17 
.09 


2d 50 


3d 50 


4th 50 


3d 100 


4th 100 


1st 400 


2d 400 


3d 400 


4th 400 


5th 400 


6th 400 


7th 400 


8th 400 


9th 400 


10th 400 


llth 400 





so on are shown. The last column of the table shows how the per- 
centage variation from the expected number tends to decrease as the 
size of the cumulative sample increases. At the end of the first 400 
throws (4,000 coins) the variation is 1.20 per cent. At the end of the 
second 400 throws the variation increases slightly to 1.28 per cent and 
then decreases regularly through the third, fourth, and fifth groups 
of 400 throws and reaches zero at the end of 2,400 throws. The exact 
result obtained at this point is purely accidental, as is the exact way 
in which the percentage variation declined with the increase in size 



72 BUSINESS STATISTICS 

of the sample. The important point is that the percentage variation 

becomes smaller and smaller as the size of the sample increases and 
that in spite of slight deviations it remains small through the seventh, 
eighth, ninth, tenth, and eleventh trials of 400 throws each. 8 

Examination of the result columns shows that sometimes the num- 
ber of heads is greater than the expected number and at other points 
the number of tails is greater than expected; there is no indication of 
any fixed bias nor any tendency for either heads or tails always to 
exceed expectancy. Specifically, for instance, after 400 throws there 
were more heads than expected, but after 800 throws there were more 
tails than expected. This difference in the direction of variation to- 
gether with the regular reduction in the percentage variation indicates 
the tendency toward regularity of the results as the size of the sample 
increases. The tossing of 44,000 coins has demonstrated the principle. 
If the tossing were continued, the percentage variations could be ex- 
pected to diminish. 

A further demonstration of the operation of the principle of statis- 
tical regularity is presented in Table 7, showing the results of throwing 
five dice. The first line of the table gives the results of the first 20 
throws of five dice (100 faces showing or, as recorded in the first 
column, 100 dice) . The maximum variation from the expected num- 
ber is the appearance of 22 fives. This variation of 32 per cent is due 
to the small size of the sample. The decline in the variation of the 
actual from the expected result can be seen as the size of the sample 
is increased. 

There are two places at which the progressive decline in variability 
is broken when the cumulative sample consists of 300 dice and when 
it consists of 3,600 dice. These two exceptions to the operation of the 
principle of statistical regularity do not disprove its universality. 
The experiment was carried on with ordinary commercial dice and 
they were thrown within a confined space rather than being permitted 
to come to rest without obstruction. Either circumstance might be 
sufficient to explain the two irregularities that appear. 

In both examples the expected occurrence of the recorded events is 
known. That is, coins should fall heads and tails with equal frequency; 
one face of a die is as likely to turn up as another. Consequently 

2 The reliability of a sample increases proportionally to the square root of the num- 
ber of cases in the sample. Thus to double the reliability, i.e., to halve the variability, 
the number of cases must be four times as great. The reason for this relation will be 
apparent from the form of the formulas for standard error in chapter XXIX. 



SAMPLING 



73 



s 



s 

Q 

B 



o 
-i 



a 

M 

d 



! 

o 

w 
_i 
cu 




JHil 

K u'Cri* a 

s*^ 



sis-. 

H- 



Ex 
No. o 
Eac 



114 



62 

HrH 



si 

w< 
83 

H*H 

S S 



ft S M 
S2^3 

SHfegS 
? fe M o 
w Ou^J 

w dbS w4< 

fc 



IB! 

iss 



S8S2SSS 

<N <N \o 06 09 OB en "* 

CO CN <N r-l 



CTJrW 

>o n o o o 
-H rnr> o o 



OOO 
ooo 

00 O CM < 



- O\ <* 

00 
rH r-l CO IT* I--* GN 



l <N fO 00 00 OO OS 00 00 00 00 O 

N rH "^ 



o 

xt 



r-i f-i rOVO 00 O fM^JJ 



r^ vo n IA \o o i 

o o\ 1-1 <N o '- 



vo rr> i-* WM-H o r- 

Oi-* -~iOOOOOOOOO 
r-CN fM < ^NOr^C\'-<rrk 



r-> 1-4 r* ^ Q OO-*C\OCNOr-t 
- 



Oor^o<N c\xjoo 
--t<r\c\ ONOO 



rH CM <N <M (N rH r-l 



r^ o vo fo -" ^ ^N 

OOOr-tr-tOOOO 



r^^ vo r- oo ^r vo o\ r 

-< CN -i ir\ r-i r-i O\ 00 00 OS OS O 



Xf r-i 00 r- <N <N r^ 00 Xf cr> (M r- 

rHCNr-irc\ oosf x -'-'-<o\o 

(N rH r-l r< <M rH CM 



ooooooo 

^^.SmO OOOOOOO 



o o o o o ooooooo 

^r-lrHCTkNO ^ *! <*! 1 **! *! **! 



74 



BUSINESS STATISTICS 



these are controlled experiments carried out to show how the principle 
of statistical regularity operates. 

Consider another example. A teacher made a practice each year 
of having each of his students measure the width of his desk. The 
same ruler was used year after year, but the students' results varied 
individually and from year to year. The ruler was subdivided by 32ds 
of an inch and the students were told to read to 64ths of an inch. 
The yearly average for eight years is shown in Table 8. 

TABLE 8 

WIDTH OF A TEACHER'S DESK ACCORDING TO MEASUREMENTS BY 
EIGHT DIFFERENT GROUPS OF STUDENTS 



YSAB 


NUMBER OF 
STUDENTS 


AVKKAGX MEASURED 
WIDTH OF DESK 
m INCHES 


1st 


32 


48.6) 


2d 


26 


48.49 


3d 


30 


48.61 


4th 


31 


48.60 


5th 


28 


48.37 


6th 


36 


48.39 


7th 


31 


48.62 


8th 


28 


48.60 



The exact width of the teacher's desk is unknown, yet these averages 
perform in the same way as the observations of coin and dice throw- 
ing. As the number of heads sometimes exceeded the number of tails 
and vice versa, so in the same way some of these averages were slightly 
above the theoretically true width, 8 others slightly below. 

The example shows that results from samples tend to group them- 
selves about an unknown true value just the same as they group 
themselves about a known true value. The coins and dice are samples 
in which the observations involved only counting. The desk data 
involved measurement. We conclude therefore that the principle of 
statistical regularity applies to both counted and measured samples. 

A major question arises at this point. Will the same regularity 
appear when the universe is less homogeneous 4 than coins, dice, or 

8 The theoretically best estimate for an unknown value is the arithmetic average of a 
large number of independent observations of that value. 

4 "Homogeneous" as used in statistical work means sufficiently alike to be used for 
the immediate purpose as though equivalent. For example, coins and dice are truly homo- 
geneous in the sense that each one is identical with every other one, but human beings, 
animals, and other materials dealt with in statistical work are treated as though they were 
homogeneous even though appreciable differences of size, weight, and other characteristics 
appear within the groups. "Non-homogeneous," or "heterogeneous," means possessing 
characteristics which are sufficiently different to require classification in different categories. 



SAMPLING 75 

the width of a desk? The answer can be obtained from another ex- 
ample. The problem was to find the average number of letters in the 
last names of persons having telephones in Buffalo, New York. Each 
page of the telephone book contained four columns. A ruler was laid 
across each page of the book near the middle of the page and the 
number of letters in the name appearing above the ruler in each column 
was counted. Four samples were taken, the first containing a name 
from the first column of each page of the book, the second a name 
from the second column and so on. There were 265 pages in the book, 
hence each sample contained 265 items. The average numbers of letters 
in the names in the samples were: 



1st sample 
2d sample 
3d sample 
4th sample 


6.51 letters 
6.52 letters 
6.51 letters 
6.54 letters 



The similarity of the four results shows how the principle of statistical 
regularity operates. These samples were chosen entirely at random, 5 
yet any one of them alone presumably would have represented the 
universe. 

Each sample contained only 265 out of a total of about 70,000 
names in the telephone book, yet there is little doubt that the average 
number of letters in the last names appearing in the book is about 6.5. 
Note that we do not expect the sample to give the exact characteristics 
of the universe but rather an approximate indication of those char- 
acteristics. The four samples vary slightly and probably each one varies 
somewhat from the true value. Such variations will always appear in 
samples. In fact the methods of analyzing data which will be devel- 
oped in subsequent chapters include the measurement of the expected 
variation of the characteristics of a universe from those characteristics 
found in a sample drawn from it. 

In the previous examples chance has operated in each case. The 
chances are equal of getting either heads or tails in tossing a coin; 
the chance that any one face of a die will turn up is one-sixth. In 
measuring the width of a desk overestimates and underestimates are 
equally likely. In the telephone book there is just as much chance that 
"Fry" will be printed near the middle of the page, as "Frendenberger." 
i The cases which arise in practical business affairs are usually not 

5 The concept of random selection of cases for a sample is explained in chapter IV, 
p. 58, footnote 3. 



76 BUSINESS STATISTICS 

so simple as these examples. The operation of pure chance which is 
so evident in the examples will be for the most part lacking in prac- 
tical work. The investigator is forced to deal with conditions as they 
exist. In general more variables will be present and as a result adjust- 
ments become necessary. The real problem of sampling is to find 
methods of selecting the cases for the sample so that the characteristic 
to be measured or counted has a chance of occurring in the sample 
in the same proportion as it occurs in the universe. The amount of 
care required to do this will be evident when it is remembered that 
the extent of occurrence of the characteristic being studied is unknown 
in the universe and can only be inferred as the final step in the analysis 
of the sample. This would be circular reasoning were it not for the 
principle of statistical regularity. Some of the characteristics of the 
universe may already be known, and if these known conditions of 
the universe can be reproduced on a small scale in the sample, then the 
operation of the principle is all that is needed to allow us to infer 
from its occurrence in the sample the extent to which a given unknown 
characteristic is present in the universe. ; 

THE TWO PROBLEMS OF SAMPLING 

There are two major factors to be considered in obtaining a sample: 

(1) how many cases must be included to obtain reliable results and 

(2) what cases must be included to secure representativeness. 

The Size of A Sample 

The first problem is, How many or what proportion of the cases 
in the universe must be taken for the principle of statistical regularity 
to operate? There is no numerical answer to this question. It would 
be wrong to say that a 50 per cent sample or a 10 per cent sample 
will be satisfactory. In fact such an answer is meaningless in coin 
tossing where the universe is infinite. Even when the universe is lim- 
ited, as in the telephone book example, we do not attempt to say that 
a certain number of cases or a certain percentage of the total number 
of cases in the universe will be a large number. The telephone book 
used in this test experiment contained about 70,000 names. A sample 
of 265 was therefore only about four-tenths of 1 per cent, yet the 
results obtained from the four independent samples fell within a very 
narrow range. 



SAMPLING 77 

The question of how many cases to include in a sample must be 
decided for each problem separately. The number depends primarily 
on the degree of reliability required and the diversity of the charac- 
teristics present in the universe. The tests for reliability are developed 
in detail in chapter XXIX. The question of diversity of characteris- 
tics can be discussed at this point. If the universe is as strictly homo- 
geneous as the letters in names in a telephone book, a very small 
sample will suffice for the purpose of determining the average number 
of letters per name. On the other hand, if a sample from the telephone 
book were used to determine the percentage of subscribers who used 
four-party service, a much larger sample would be required to insure 
that proper provision was made for the tendency toward use of this 
type of service in different parts of the city, for the inclusion of mainly 
residential subscribers since few business places use four-party service, 
and for the exclusion of those exchanges, if any, which do not provide 
four-party service. 

i This example demonstrates further the importance of a previous 
statement, that the question of homogeneity of the universe depends 
upon the purpose for which the sample is taken. Thus only a relative 
statement can be made concerning the size of a sample. If the events 
in the universe differ only with respect to the characteristic which is 
tested by the sample, a sample as small as one-tenth of 1 per cent of 
the universe may be adequate for the principle of statistical regularity 
to be effective. As the number of characteristics which vary in the 
universe increases, the size of the sample must be increased, sometimes 
becoming as large as 10 per cent of the universe. If a sample greater 
than 10 per cent is required to reproduce the characteristics of the 
universe, the universe itself is probably not sufficiently homogeneous 
for the principles of sampling to be used. 

Methods of Securing a Representative Sample 

The second problem is how to secure a representative sample. In 
particular, What cases shall be included in order to set up in the sample 
a pattern which will reproduce on a smaller scale the conditions of the 
universe? In some cases none of the conditions of the universe may 
be known, while in other instances information is already available 
concerning the distribution of certain characteristics. Consequently 
there are two methods of securing representativeness: (1) uncontrolled 
sampling and (2) controlled sampling. 



78 BUSINESS STATISTICS 

Uncontrolled sampling. If little or nothing is known about the 
distribution of any of the characteristics in the universe, an uncontrolled 
sample is the only one which can be used. 

Example I : Suppose a tobacco retailer wishes to make a consumer 
investigation of the question, "What brand of cigarettes is most pop- 
ular in this city?" It would be difficult to find a "control" in this case, 
because nothing is known regarding the characteristics of cigarette 
smokers as a group of the population. It is known in general that 
children do not smoke cigarettes, but just what the proportion of 
cigarette smokers is in each adult age group would be hard to estimate. 
It is not even known how the percentage of men cigarette smokers 
compares with the percentage of women smokers. If there had been 
some recent nation-wide study showing what percentage of each sex 
smokes cigarettes, these two percentages would provide a control to 
determine the proportionate number of men and women from whom 
replies should be obtained in this study. 

In the absence of any such control, the most obvious method is to 
take cases from the universe as they come to hand, making no choice 
of any kind. Even this involves some selection of time and place for 
taking interviews. The method must insure a rough representation of 
all the general characteristics of the adult population, on the assump- 
tion that, provided the sample is large enough, smokers of various 
brands of cigarettes will be included in correct proportion. That is, 
since age, sex, nationality, economic class, or other characteristics may 
be determining factors in the choice of brand of cigarettes, the answers 
must come from persons who are representative of the total adult popu- 
lation in as many respects as possible. It would not serve the purpose, 
therefore, to distribute questionnaires only at women's clubs, or only 
at an industrial plant, or to interview only relief clients, or only people 
on the street at three o'clock in the afternoon. But if a busy down- 
town corner were selected, at an hour when all classes of men and 
women, employed as well as unemployed are likely to be on the street, 
the passers-by should be fairly representative of the entire adult popu- 
lation. If stopped and asked what brand of cigarette they buy, 
some would reply, others would ignore the question; some would be 
cigarette smokers, others would not smoke cigarettes or would not 
smoke at all/ The investigation might show that six hundred and thir- 
teen cigarette smokers replied and the answers were tabulated as fol- 
lows: Brand "A," 18 per cent; Brand "B," 16 per cent; and so on. 



SAMPLING 79 

1 The important feature of this method is the absence of control of 
the sample. Experiments have shown that reliable results can be 
obtained by this method only through the use of a large sample. The 
distribution of none of the characteristics in the universe is known; 
therefore a large sample must be taken to insure that the pattern of 
the universe will be reproduced. This uncontrolled plan of collection 
is known also as the extensive method of sampling. 

Example 2: The use of this method in an investigation is illustrated 
by the chain store inquiry of the Federal Trade Commission conducted 
in 1928 and published in 1930-31 The sample was obtained in the 
following manner: 

A mailing list of chain stores was prepared by the commission from various 
lists of chains, including those of the Chain Store Age and the National Asso- 
ciation of Real Estate Boards, supplemented by telephone directories, trade 
journals, and city directories, all of which were checked to eliminate, so far as 
possible, duplications. When completed, this mailing list for the selected groups 
of chain stores included slightly over 7,500 names. The results obtained from 
this mailing list are shown in the following tabular statement: 

Schedules mailed 7,515 

Returned by post office 713 

Duplications 638 

Non-chain establishments only 1,282 

Co-operative group only 39 

Reported out of business 492 

In receivership, no records, or records destroyed, etc 833 

Unobtainable at time of tabulation 1,596 



Total eliminated 5,593 



Schedules returned 1,922 6 



Only 1,727 of the 1,922 schedules were usable in the analysis but the 
Commission appraised their representativeness as follows: 

Comparing the commission':; data with estimates for the entire field based 
upon census data, it appears that the commission's study represents approximately 
one-half of the number of stores operated and one-half of the aggregate sales 
volume of all organizations engaged in chain-store merchandising in 1929 in 
the 26 kinds of business covered by this inquiry, including chains of two and 
three stores, which are not classed by the census as chain stores. On the other 
hand, the total number of chains represented in the commission's inquiry is 
estimated to be something under 10 per cent of the total. 7 

6 "Scope of the Chain-Store Inquny," Chain Store*, 72d Congress, 1st Session, Senate 
Document No. 31, p. 9. 

7 Ibid., p. ix. 



80 



BUSINESS STATISTICS 



The Commission had to treat its data in different classifications, 
hence the real problem of representativeness arose in the sub-groups. 
A comparison of sample data with Census of Distribution data based 
on the parts of each which were considered comparable is shown in 
Table 9. 

TABLE 9 

PERCENTAGE OF TOTAL CENSUS CHAINS (FOUR STORES AND UP), STORES, AND SALES IN 

1929 RFPRESENIED IN THE COMMISSION'S ORIGINAL CHAIN-STORE SCHEDULE RETURNS 

FOR CHAINS OF Six STORES AND UP FOR 1928* 



KIND OF CHAIN 


PERCENT REPRESENTATION OF CENSUS IN 
COMMISSION'S SAMPLE 


Chains 


Stores 


Sales 


Food 


23.8 
17.7 
17.2 
449 
20.8 
20.6 
34.4 
28.7 
6.4 
4.0 
12.1 
13.3 


76.9 
42.8 
109.8 
80.4 
30.6 
33.7 
55.6 
53.5 
9.3 
7.6 
15.7 
25.3 


76.5 
55.5 
104.0 
89.8 
34.7 
52.9 
56.8 
97.8 
7.8 
11.9 
24.8 
22.0 


Drue 


Tobacco 


Variety 


Clothing, furnishing, and accessories 


Hats, caps, and millinery 


Shoe 


Department store and dry goods 


General merchandise 


Furniture 


Musical instruments 


Hardware 


Total 


21.8 


66.3 


69.2 



* "Scope of the Chain-Store Inquiry, 1 
Document No. 31, p. 28. 



Chain Stores, 72d Congress, 1st Session, Senate 



Table 9 is complicated by the fact that the ratios in the three 
columns have in the numerator results from the Federal Trade Com- 
mission's sample of chains operating six or more stores in 1928 and 
in the denominator results from the Census of Distribution, a complete 
enumeration of chains operating four or more stores in 1929. From 
this comparison the Commission concluded: 

The purpose of Table [9] obviously is not to present an exact measure of 
the proportions of the commission's data either of the chain-store field as a 
whole or by specific commodities but rather to afford a general impression as 
to the kinds of business in which the commission data may be regarded as 
sufficiently comprehensive as contrasted perhaps with those for which the figures 
should be regarded merely as indicative because of the comparatively small 
representation in comparison with the census totals. 

It should, of course, be recognized that the foregoing proportionate com- 
parisons are approximations, both because of the variations in classification and 
the necessary treatment of the 4- and 5 -store chains in the commission data. 

In general, it appears that with the possible exception of general stores and 
furniture chains, the commission's reports are sufficiently adequate to provide a 



SAMPLING 81 

satisfactory indication of chain store operations in the several kinds of business 
considered. 8 

This sample was obtained without exercising any control over the 
cases which should be included. As a result only part of the organiza- 
tions to whom the questions were sent proved to come within the 
definition of chain merchandisers. The ultimate size of the sample 
was unknown until the returns had been edited. Even when the size 
of the sample as a whole was known the representativeness of the 
sample with respect to different kinds of chains was in doubt until 
a partial comparison could be made when the results of the 1929 
Census of Distribution were published. Finally it turned out that in so 
far as the comparisons with census results were valid, the various lines 
of trade were not equally well represented in the sample, although 
with two exceptions the sample was considered large enough to provide 
representative information concerning chain-store merchandising in 
different lines of trade. 

Controlled sampling. When knowledge of some of the character- 
istics of the universe can be obtained, the usual practice is to take a 
controlled sample. A controlled sample is one in which representative- 
ness is obtained by conscious adjustment of the sample to conform to 
the conditions existing in the universe according to one or more known 
characteristics. The known characteristics are not the ones that are 
being studied in the sample investigation. For instance, in a survey of 
buying habits of students at a certain university, the number registered 
in each class is a matter of record at the registrar's office. This known 
distribution can be used in selecting a representative number of sample 
cases from each class, and in order to check with this control each 
student interviewed must be asked what class he or she is in. How- 
ever, the object of the investigation is not to determine the distribution 
of students by class, but rather to assemble, by means of sampling, a 
variety of hitherto unrecorded information regarding the buying habits 
of all the students. 

The advantages of the controlled method lie (1) in the substitution 
of a known representativeness for one hoped for on the basis of size 
of sample alone and (2) in the small size of sample which can be 
used. If the information had been available for the chain-store inquiry 
to have followed this plan of sampling, the first step would have been 

pp. 28, 30. 



82 BUSINESS STATISTICS 

a study of existing information concerning chain stores to obtain for 
each line of trade covered by the investigation the best estimate of the 
number of chains, the proportion of large and small chains, the dis- 
tribution of sales by lines of trade, and the total sales. Guided by this 
information the sample could have been planned so as to get the 
proper representation of large and small chains and of the several 
lines of trade. Thus a reasonable amount of control over the sample 
would have been exercised. 

Controlled sampling may be further differentiated according to the 
degree of selection used in establishing the controls. If there is 100 
per cent selection, leaving no elements to chance in determining the 
actual cases that appear in the sample, the method is called selective 
sampling. If a certain degree of selection is used, but the final deter- 
mination of actual cases is left to chance, this is called the inclusive 
method. Examples of the latter will therefore cover the entire range 
between uncontrolled, or extensive, sampling in which the distribution 
of none of the characteristics of the universe is known, and selective 
sampling, in which each individual case is picked because of the known 
representativeness of its general characteristics.! 

The selective method: An example 9 will show the method by which 
the investigator, on the basis of his knowledge of the universe, hand- 
picks a small number of cases which he believes will be a representa- 
tive sample. 

Five years ago Mr. William Groom of the Thompson-Koch Company was 
interested in the possibility of measuring directly in terms of sales the effective- 
ness of the advertising produced by his agency. Mr. Groom selected four 
middle-western cities of 35,000 to 50,000 population. He planned to run his 
experimental campaigns in the newspapers in these cities and to measure results 
in terms of the sales made through the local drug stores. To this end Mr. Groom 
enrolled from three to a dozen or more drug stores in each of his cities and 
paid them to submit each month a statement of the sales made during the 
month of each of a number of different drug items. 

In order to be able to generalize from the results, it is, of course, desirable 
to make any study of retail sales in communities which are more or less repre- 
sentative. Mr. Groom originally chose his four test towns on the basis of a 
personal knowledge of the communities, and a belief that these communities 
were fairly representative of a great part of the country. 

The thought in using cities of this size and character was that they presented 
a mixture of urban and rural people and problems and, therefore, were more 

9 I.yman Chalkley, Jr., "The Flow of Sales through Retail Drug Stores A Factual 
Study," Harvard Business Review, Vol. XII, No. 4 (July, 1934), pp. 427-29. 



SAMPLING 83 

representative of the whole country than either the larger metropolitan centers 
or the purely rural districts. Each city has some manufacturing, some farming, 
and some general business and professional activities proportioned roughly 
like those in the country as a whole. 

Although only four cities were included in the study, the investi- 
gators expected their results to be representative of the country as a 
whole. Further the records of 23 drugstores out of a total of about 
100 in the four cities were used. These 23 were personally selected by 
Mr. Groom. Finally the sales of 12 selected items became the data of 
the study. The selection, in every respect, of the cases to be included 
in the study is the important point of the example. 

This description of the procedure immediately gives rise to three 
questions: (1) Were these four towns representative of the country 
as a whole as regards the relation of sales to advertising? (2) Were 
the sales of the 23 stores representative of sales of all drugstores in 
the four communities? (3) Were the 12 items the proper ones to 
study? The equivalent of these questions must be raised with respect 
to the plan for any selective sample. When the investigator feels that 
he has sufficient knowledge of his universe and of his sample to be 
able to answer such questions, there is some justification for the use 
of the selective method of sampling. Under the usual practical condi- 
tions no such assurance is possible, hence the method should be used 
sparingly. The danger lies in getting a biased result if the selection 
should go astray at any stage of planning the sample. 
! The inclusive method: Two examples will illustrate various degrees 
of control in securing an inclusive controlled sample. 

Example 1 : An advertising firm was asked to make a quick survey of 
the nation-wide popularity of a certain brand of scouring powder. The 
firm first selected a few cities which were believed to be representative 
of the entire country. Local supervisors were appointed, each of whom 
was familiar with conditions in her own city, and they were asked to 
select sample areas that would be representative of nationality groups, 
old and new residential districts, etc., but each containing a variety of 
income levels. One agent was assigned to each area and was given 
freedom to select the housewives whom she would interview, except 
that her total number of interviews must be divided approximately 
into 20, 40, 30, and 10 per cent of four roughly defined economic 
classes. (In certain areas adjustments of the required proportions were 
made according to the known economic levels in the neighborhood.) 



84 BUSINESS STATISTICS 

If toward the end of an assignment an agent found that she had 
secured a markedly unbalanced proportion of interviews according to 
economic class, she then had to exercise some degree of selection in 
choosing the blocks and houses to visit so that her remaining schedules 
would make up the deficiency. For the most part, however, if the 
agents used some system of random selection such as calling at every 
third house or canvassing every other block, they found that they usu- 
ally had the right proportion of interviews without making any con- 
scious adjustment. 

This method permitted some degree of selection according to cer- 
tain general characteristics within each of three controls of the inves- 
tigation, (city, area, and economic group) . In spite of this fact, most 
of the housewives interviewed were chosen solely by chance: they hap- 
pened to be at home; they happened to live in a block near the car 
line where the agent started to canvass, etc., all of which were factors 
that had no effect whatever on their choice of scouring powder. The 
key idea in this method was that all of the types of families were 
given a chance to appear in the sample in proportion to the number 
of each type existing in the community. All of the characteristics of 
the universe were given a fair chance to be included in the sample. 
Beyond that point no selection was made of the individual cases that 
were actually taken. 

Example 2: In this investigation 10 by inclusive sampling selection 
was exercised only at the first level of the plan, and at later stages the 
returns were determined wholly by chance. 

An investigation was made by the Bureau of Business Research [of the 
University of Pittsburgh] in the spring of 1931 to determine the cost and 
the quality of housing accommodations secured by salaried workers em- 
ployed in downtown Pittsburgh. The housing status of 1,415 families was 
analyzed. 

The data for this study were secured by means of questionnaires distributed 
to salaried workers through their employers. The co-operation of the following 
types of concerns with offices in downtown Pittsburgh was secured for the 
distribution of the questionnaires: two public utilities, four department stores, 
five financial institutions, five industrial concerns, one railroad, and two insur- 
ance agencies. 

It is believed that the employees of these concerns represent a fair cross- 
section of the salaried workers employed in downtown Pittsburgh. 

10 Theodore A. Veenstra, "Housing Status of Salaried Workers Employed in Pitts- 
burgh," University of Pittsburgh Bulletin, Vol. XXVIII (June 10, 1932), pp. 1-4. 



SAMPLING 83 

Questionnaires were distributed among salesmen, accountants, clerks, statis- 
ticians, engineers, and junior executives. In order to get replies from the type 
of worker selected for the study, co-operating concerns were asked to distribute 
the questionnaires to employees described as follows: "Heads of families en- 
gaged in clerical and executive work in downtown Pittsburgh with salaries of 
$5,000 annually or less." In general the persons reporting were heads of 
families engaged in the designated types of work. Salaries in a number 
of cases were in excess of $5,000 ; but such cases, if otherwise acceptable, were 
included in the study. 

A large proportion of the 1,385 persons reporting occupations were en- 
gaged in clerical work. Those having executive, technical, selling, and account- 
ing positions were next in order in numbers reporting. Other groups were only 
sparsely represented. 

The universe from which this sample was taken included only 
salaried workers in offices in downtown Pittsburgh who were heads 
of families. The $5,000 limit automatically excluded high-salaried 
executives. Thus the limits of the investigation were rather closely 
defined. The 19 firms were selected because they represented the known 
distribution of different lines of business in which the desired kinds 
of workers were employed. It was assumed that the employees of 
these concerns were representative of the universe as to types of work 
and salary distribution. Undoubtedly many failed to reply, but, since 
the original group was so carefully selected, the reply of any employee 
was just as acceptable as that of any other. As long as a sufficient 
number of replies was secured the information regarding housing 
could be considered as representing the entire group. 

The weighted method: The weighted method of controlled sam- 
pling is, in its initial steps, the same as inclusive sampling. That is, a 
definite effort is made to secure cases in the sample that will represent 
the known characteristics of the universe. As a further step, however, 
the sample is again consciously adjusted after it has been collected in 
order to bring it into closer conformity with these known general 
characteristics. In this process none of the cases is dropped from the 
sample, but all are grouped and weighted in order to give each group 
the importance that previous knowledge of the universe indicates it 
should have. 

This method was used in predicting the results of the presidential 
election in 1936, a detailed description of which is provided in an 
explanation of the work of the American Institute of Public Opinion 
(Gallup Polls). 



86 BUSINESS STATISTICS 

The weighted-sample technique assumes that it is possible to isolate and 
measure factors, or groupings, which determine the distribution of the variable 
in question. The method of the weighted sample tries to choose from the 
many possibilities the important determinants of voting behavior. The sample 
is then constructed by preserving in the miniature population the ratios of the 
selected groupings which hold for the total population. 

The problem of the selection of the significant groupings in the voting 
population was solved in this poll by experimentation. Gallup tried distributing 
straw-vote returns according to various factors. Those which showed an even 
distribution of ballots between the major candidates were discarded. Five con- 
trols were finally chosen. First, ballots returned from each state were to repre- 
sent the correct proportion of the state's population to the national population. 
Second, the ratio of farm and city votes in each state was to be maintained. 
Third, the correct percentage of voters in each income group had to be 
represented. Fourth, the ballots returned were to reflect accurately the propor- 
tion of young people who had come of voting age since the last election. 
Fifth, the return was to come from the correct percentage of people who voted 
for Roosevelt, Hoover, Thomas and others in 1932. 

The distribution of ballots in the proper proportion is, however, only half 
the story. Ballots leave the polling office in the proper ratio according to the 
factors mentioned. But the correct ratio is seldom maintained after the round 
trip from office to voter to office. As a rule less than one-fifth of the mailed 
ballots are returned and these tend to come from selected groups. People with 
intense opinions (reformers, arch-conservatives, radicals) are more likely to 
return ballots than those who are luke-warm or undecided; more highly edu- 
cated and economically secure persons take a greater interest in the ballots and 
feel more free to answer them. The American Institute found that the largest 
response (about 40 per cent) came from people listed in Who's Who. Eighteen 
per cent of the people in telephone lists, 15 per cent of the registered voters 
in poor areas, and 11 per cent of people on relief returned their ballots. Men 
are more likely to reply than women. 

These peculiarities in the mail response of the sampled population are 
counteracted in two ways: using interviewers, and adjusting the final number 
of ballots according to the original quota-controls. The Institute had some 200 
interviewers scattered throughout the nation. The answers they gathered con- 
stituted one-third of the final return for the Institute poll. Interviewers can be 
used advantageously where the mail ballot is not likely to succeed: in relief 
districts, farms, and working class areas. 11 

A partial description of the method by which the returns were 
adjusted according to the control groups may serve as a guide to the 
general application of this method. 

The criteria used as controls are established from known data such 



11 Daniel Katz and Hadley Cantril, "Public Opinion Polls," Sociometry, Vol. I 
(1937), pp. 159-60. 



SAMPLING 



87 



as United States population by states, farm and non-farm population 
of each state, age groups of the population, and the 1932 election 
returns by states. The proportionate distribution of replies received 
is compared with the distribution in the "control" group for each of 
these criteria successively in order to secure for the votes cast in the 
straw ballot a redistribution that will be in every essential truly repre- 
sentative of the total voting population. 

The adjustments according to four of the controls are made by 
states. As an example of the method of procedure, suppose that after 
the special interviews the straw ballots received from New York State 
were distributed as in Table 10, according to farm and non-farm voters. 

TABLE 10 

STRAW BALLOTS IN NEW YORK STATE, 1936 

ORIGINAL RETURNS FROM FARM AND NON-FARM VOTERS 

(HYPOTHETICAL DATA) 



VOTKM 


CHOICE OF CANDIDATE 


TOTAL 


Roosevelt 


Land on 


Thomas 


Farm 
Number 


2,500 
53.6 

75,000 
73.5 
77,500 


1,975 
43.9 

25,000 
24.5 
26,973 


25 
0.5 

2,000 
2.0 

2,025 


4,500 
100.0 

102,000 
100.0 
106,500 


Percentage distribution 


Non-Farm 
Number 


Percentage distribution 
Total number 



The first question is whether the proportions of farm and non-farm 
voters conform to the census distribution. Of the total 106,500 straw 
ballots cast, 102,000, or 96 per cent, were by non-farm voters. Accord- 
ing to the census, however, the population of New York State is 94 
per cent non-farm and 6 per cent farm. If the 106,500 straw ballots 
had been divided in that proportion, there would have been 6,390 farm 
instead of 4,500, and 100,110 non-farm instead of 102,000. These 
corrected figures are therefore substituted for the total ballots cast. 
They are then redistributed as to choice of candidate according to the 
same percentage distribution that was found from the actual ballots. 
That is, the percentage distributions shown in Table 10 are applied 
to the new totals for farm and non-farm giving the choices for each 
candidate by the farm and non-farm voters as shown in Table 11. The 
corrected state totals for each candidate are obtained by adding the 
corrected farm and non-farm votes in each case. It will be noted that 
the grand total for the state, 106,500, has not been altered. 



88 



BUSINESS STATISTICS 



TABLE 11 

STRAW BALLOTS IN NEW YORK STATE, 1936 

CORRECTED FOR FARM AND NON-FARM POPULATION; 

DATA FROM TABLE 10 



VOTXM 


CHOICE OF CANDIDATE 


TOTAL 


Roosevelt 


Landon 


Thomas 


Farm 
Number 


3,533 
35.6 

73,581 
73.3 
77.134 


2,803 
43.9 

24,527 
24.5 
27.332 


32 
0.5 

2,002 
2.0 
2.034 


6,390 
100.0 

100,110 
100.0 
106.500 


Percentage distribution 
Non-Farm 
Number 


Percentage distribution 
Total number 



The same process can now be repeated for the other three controls 
within each state starting with the original returns for each of the 
three. The four totals thus arrived at according to the four criteria 
can then be averaged to give the true representation for New York 
State. After similar adjustments have been made for each state the 
results for the 48 states are ready to be combined according to the fifth 
criterion, the proportion of each state's population to the United States 
total. In this last step the number of ballots cast in each state is ad- 
justed but the actual total number of ballots cast in the United States 
remains unchanged. 

Through the introduction of these five independent controls, the 
internal distribution of cases in the sample might be considerably 
altered, but such alteration is made in order to adjust discrepancies 
between the collected information and the known occurrence of the 
five control characteristics in the universe. As a result of these altera- 
tions the cases are distributed in the sample so that the unknown 
variable characteristic, choice of candidate in 1936, can be studied 
without distortion arising from failure of one or more of the known 
characteristics to be represented properly. 

Summary. Before closing this discussion it should be pointed out 
that the three methods of obtaining a representative sample depend 
upon the principle of statistical regularity in different ways. (1) In an 
extensive sample the principle has its purest application when the 
appearance of the characteristics of the universe in the sample is left 
entirely to chance. (2) In a selective sample the investigator's knowl- 
edge of certain characteristics of both universe and sample is substi- 
tuted for random choices. He is still depending on the sample to give 



SAMPLING 89 

him information regarding certain unknown characteristics of that 
universe. (3) In an inclusive sample, either unweighted or weighted, 
the known characteristics of the universe are definitely projected into 
the conditions of the sample, but the appearance of unknown or un- 
controlled characteristics is left to chance through random selection of 
the actual cases. None of these methods can be used in automatic 
fashion. Careful planning by the investigator will always be needed. 
His two most valuable assets will be experience and the exercise of 
good judgment. 

PROBLEMS 

1. a) Would the members of your statistics class be a representative sample 

of the students of your school as to height? weight? age? grades? 
hair color? eye color? Discuss. 

b) Would the members of the class be a representative sample of all 
college students with respect to the characteristics listed? Discuss. 

2. If a large number of samples, each including 400 cases, show an average 
variability of 4 per cent from a known result, how large a sample would 
be required to confine the variability to 1 per cent? to 3 per cent? to 
8 per cent? to .5 per cent? 

3. Why is there less precision in the results of the dice example (Table 7) 
than in the coin example (Table 6) ? 

4. A retail gasoline station proprietor wished to obtain from his customers 
the following four types of information: The average mileage of cars per 
gallon of gasoline, the name of the manufacturer of the tires on the cars, 
the place of residence of the customers, the proportion of customers using 
premium gasoline. One of these types of information could be obtained 
only by the census method, one by a relatively small sample, one by a 
relatively large sample, and representative information on one of them 
probably could not be obtained either by sampling or census. Identify the 
four types of information according to the preceding description. 

5. Basing your answer on the quoted paragraph at the bottom of page 80, 
and Table 9, discuss the question of whether the Federal Trade Com- 
mission's gross sample was representative of large and small chain stores. 

6. What are the essential differences between uncontrolled sampling and con- 
trolled sampling? Between extensive sampling, selective sampling, inclu- 
sive sampling, and weighted sampling? 

7. Given the following information concerning the 25,900 farms in five 
counties in 1940. 



90 



BUSINESS STATISTICS 



COUNTY 


OWNERS 


TENANTS 


Sell 
Milk 


Do Not Sell 
Milk 


Sell 
Milk 


Do Not Sell 
Milk 


A 


243 
437 
1,190 
946 
1,762 


608 
219 
2,166 
239 
2,913 


110 
633 
2,240 
1,412 
2,812 


542 
461 
1,892 
928 
4,147 


B 


c 


D 


E 


Total 


4,578 


6,145 


7,207 


7,970 



a) Set up the distribution of a sample of 400 cases to be collected from 
these counties by agents to obtain information concerning "the number 
and breed of cows used in dairy herds of farmers selling milk, and 
the average daily production of milk per cow." 

b) Set up the distribution of a sample of 400 cases collected by agents 
to obtain information concerning "the difference in living standards, 
if any, between farmers who sell milk and those who do not" 

c ) Is the sample of 400 large enough in each of the preceding investiga- 
tions, i.e., is the number of cases in the sub-groups sufficient to pro- 
vide for proper operation of the principle of statistical regularity? 
Could the sample contain less than 400 cases? Discuss. 

8. Suppose that an investigation by the sampling method were to be made 
of the extent of employment, unemployment, and part-time employment in 
a city of 500,000 population. The committee in charge would have to 
consider the following methods: 

A. Using schedules in the hands of agents 

1. Visit one house on each side of the street in each block of the 
city 

2. Select in advance representative blocks in the city and visit each 
house in those blocks 

3. Start from a common point with areas whose boundaries run 
out from the center like spokes in a wheel and instruct the agents 
to proceed within their areas until 

a) They have secured 1,000 completed schedules 

b) They have visited 1,000 houses 

4. Get names and addresses of unemployed from the local welfare 
bureau and names and addresses of employed from 20 leading 
employers. Visit the former to obtain data on unemployment 
and the latter to obtain data on employment and part-time 
employment. 

B. Using mail questionnaires 

1. Address the occupant at number 13 of every series of 100 street 
numbers, for example, send a questionnaire to the occupant at 
13 Englewood Ave., 113 Englewood, 213 Englewood, etc. 



SAMPLING 91 

2. Address every tenth person in the telephone directory 

3. Address every twentieth person in the city directory 

4. Address the first 50 persons in each letter of the alphabet in 
the telephone directory. 

a) Discuss the likelihood of securing a representative sample by each 

method, (1) in the "A" group, (2) in the "B" group. 
b) Rate the four methods of each group according to the kind and degree 
of control exercised in the sample. 

c) How would you obtain this information by a completely uncontrolled 
sample? 

REFERENCES 

BROWN, LYNDON CX, Market Research and Analysis. New York: The Ronald 
Press Co., 1937. 

Chapter 10 deals with the planning of a sample. 

HARPER, F. H., Elements of Practical Statistics. New York: The Macmillan 
Co., 1930. 

The presentation of principles and methods of sampling in chapter I is 
very stimulating, particularly the discussion of weighted sampling. 

Standards of Research. Des Moines, Iowa: Meredith Publishing Co., 1929. 
Pages 27 and 28 give a specific statement of the policy followed in dis- 
tributing questionnaires. 

YULE, G. UDNY, and KENDALL, M. G., Introduction to the Theory of Statistics, 
London: Charles Griffin and Co., Ltd., 1937. 

Chapter 18 contains an unexcelled statement of the theoretical background 
of sampling. 



CHAPTER VI 
COLLECTION OF DATA DIRECT SOURCES 

DESCRIPTION OF DIRECT SOURCES 

IN CHAPTER IV direct sources were defined as the business 
concerns, government and private agencies, and individuals from 
whom statistical information not otherwise available could be 
secured by direct appeal. The type of source to which appeal will be 
made depends upon the kind of information desired. The internal 
records of business concerns are the original sources of such data 
as sales, profits, costs of doing business, employment, wages, and prices. 
Most of the information is private and can be obtained only on a 
confidential basis, but business concerns have come to realize the ad- 
vantage to themselves and to the public of supplying the information 
for statistical purposes, provided the use made of it is not detrimental. 
Hence a large amount of valuable information may be obtained 
directly from business concerns. 

The next sources from which information can be obtained are gov- 
ernment and private agencies. Government agencies here refers not 
so much to those engaged in collecting and publishing statistical 
information as to those directly concerned with the control or regula- 
tion of business. Examples are the Board of Governors of the Federal 
Reserve System, the Federal Trade Commission, and the state public 
utility commissions. Agencies such as these are in a position to supply 
a great amount of information on special subjects in addition to that 
which they publish. Private agencies include trade associations, labor 
organizations, industrial institutes, charitable organizations, statistical 
services, research bureaus, and co-operative groups. In many cases 
these agencies are more valuable sources than business concerns. This 
is particularly true when information for an industry or an area is 
desired rather than for individual firms. 

Finally, the statistician who is interested in data relative to con- 
sumption habits must expect to get his information from individuals 
or family groups. This is perhaps the most difficult source from which 
to obtain data because of the large number of persons who must be 
canvassed in order to get enough data for statistical purposes, and 

92 



COLLECTION OF DATA DIRECT SOURCES 93 

because individuals so frequently do not possess the desired informa- 
tion or are unable to give it accurately even though it concerns 
themselves. 

There is considerable difference in the actual collection process 
from direct sources according to whether or not the data exist in the 
files of the informant in such a form that they can be transferred to 
collection blanks. Collection from business firms and similar organi- 
zations quite commonly means merely transferring data from the 
records. On the other hand collection from individuals may require 
a lengthy process of interviewing to secure the information wanted. 

COLLECTING DATA FROM DIRECT SOURCES 

Once it has been determined that the collection of data must be 
made from direct sources, and the preliminary decisions concerning 
census or sample and the use of agents or mail questionnaires have 
been made, the investigator is ready to put his plan into operation. 
The work follows a natural sequence of steps regardless of the size 
of the investigation. The several steps are: (1) the provision for 
physical equipment; (2) a preliminary study of the field; (3) the 
choice of cases for the sample; (4) preparation of agents' schedules 
or mail questionnaires; (5) the selection and training of a staff; 
(6) supervising the work of collection. 

There is literally no end to the amount of detail which might be 
introduced in discussing these steps. The intention is to present no 
more explanation than is necessary to give a broad view of the work. 
There is a wealth of reference material available from which more 
detailed information can be obtained. 

The Provision of Physical Equipment 

An office must be set up as a headquarters for the investigation. 
In some cases only pencil and paper and a place to work are needed. 
In general, however, equipment for filing, tabulation and calculation, 
typewriters, forms for recording the progress of the work, and similar 
materials must be provided. 

A Preliminary Study of the Field 

No matter how carefully the general plan of an investigation has 
been developed there will be certain peculiarities which need to be 



94 BUSINESS STATISTICS 

discovered and provided for before starting the actual collection of 
data. A preliminary study may bring them to light and at the same 
time pave the way for the regular work. If there are technical terms 
used in an industry, these should be known in advance. A knowledge 
of the form in which records are kept and the units in which data are 
recorded will aid in phrasing questions. The advice of leading firms 
or agencies will be useful in showing the proper method of approach 
to others who are to be canvassed. This advice will be particularly 
valuable, if there are some concerns that are difficult to approach. 

A common practice is to test a preliminary draft of questions by 
submitting them to a small sample of those from whom the informa- 
tion is to be obtained. The knowledge acquired in this way will aid 
in preparing the final draft of the questions, provide the background 
for improved agent technique, and create advance good-will for the 
investigation. ) 

The Choice of Cases for the Sample 

; The method of selecting the cases to be included in a sample is one 
of the most vital steps in the entire collection process. For that reason 
all of chapter V is devoted to an explanation of the principles of sam- 
pling and the methods of choosing samples. The exact plan to be 
followed in selecting cases must be worked out in advance by the 
director and the importance of conformity to the plan must be im- 
pressed upon everyone connected with the investigation. If any 
subsequent change in the plan of sampling becomes necessary, such 
change should be made only with the knowledge of the director. For 
example, if a field agent in a consumer survey has been assigned a 
particular family, and finds it unwilling to give information, he should 
not try the house next door but should obtain a new assignment from 
the office. ! 

Preparation of Schedules and Questionnaires 

The success of an investigation depends to a large extent upon the 
quality of the questions used. There will be considerable difference in 
the type of question included depending upon whether schedules in 
the hands of agents or mail questionnaires are employed. Agents can 
generally secure replies to questions which are more involved and more 
personal than those on mail questionnaires. In spite of this difference, 
the two types of lists of questions can best be discussed together with 



COLLECTION OF DATA DIRECT SOURCES 95 

separate explanations of the points that refer to one and not the other. 
There are four things to be considered in preparing a schedule or 
questionnaire: (1) content, (2) wording of the questions, (3) defini- 
tions, and (4) form. 

Content. In outlining the content of a schedule or questionnaire, 
the guiding principle is unity. The questions must be determined in 
terms of the objective of the investigation. Only those questions 
should be included which contribute directly or collaterally to the 
objective. Further, the questions must be so planned that the replies 
can be tabulated to yield answers to the questions proposed at the 
outset of the study. This requires that careful consideration be given 
to the ultimate goal of the investigation. 

The object of a WPA project in an eastern city was to study the 
extent of the repair and modernization work which might be antici- 
pated in the city under Title I of the Federal Housing Act of 1934. 
The schedule of questions in Figure 2 was prepared for the study. For 
multiple family dwellings a schedule was to be filed for each dwelling 
unit in the building. 

FIGURE 2 

SCHEDULE USED IN A REAL ESTATE SURVEY 

1. How many occupants? 

2. How many rooms? 

3. Basement? 

4. Stories? 

5. Single or double garage? 

6. Electric refrigerator? 

7. Rent? 

8. When was house built? 

9. Owner or renter? 

10. How long has occupant lived in house? 

11. Automobile? 

12. Use auto for work? 

13. How long to go to work? 

14. How many in family are working? 

15. What kind of heat? 

16. Fuel used? 

17. Single or double house? 

18. Is house in good condition? 

19. Who pays water rent? 

If questions 5 and 11 disclosed the fact that the family had an automo- 
bile and no garage, or a single garage and two automobiles, presumably 



96 BUSINESS STATISTICS 

that family could be interested in garage construction. If question 15 
showed that the house had no central heating system or had an anti- 
quated system, perhaps the family would be interested in improved 
heating installation. If the answer to question 18 was a simple nega- 
tive, further investigation of the house would be necessary in order 
to determine whether the deficiency was lack of paint, a rotting porch, 
a leaking roof, defective plumbing, or other needed repairs. Whatever 
the deficiency turned out to be, the family could presumably be inter- 
ested in remedying it. 

Some of the questions such as 6, 12, 13, 14, 16, and 19 are difficult 
to justify in this schedule; therefore in the revised form, Figure 3, 
they have been omitted. The proposed revision is designed to give 
more information concerning the repairs and modernization needed 
and to facilitate collection and tabulation. Agents using the revised 
schedule could save much time and effort because they would not be 
forced to ask any irrelevant questions and a better impression would 
be made on the informant. 

FIGURE 3 

PROPOSED REVISION OF REAL ESTATE SCHEDULE, FIGURE 2 
HOUSE 

( 1 ) Address 

(2) Stories: 1 2 3 4 B A (3) Single Double Other 

(4) Year built (5) Garage: 123 or more 

DWELLING UNIT 

(6) Floor (7) Years lived in by present occupant 

(8) Owner Renter (9) Monthly rent 

(10) No. of rooms bath (11) No. of occupants 

(12) No. of automobiles owned 

(13) Heating equipment central heat: Yes No 

(14) Hot air steam hot water 

(15) Any repairs needed: Yes No 

REPAIRS NEEDED 

(16) House: (17) Dwelling unit: 

Paint Electric wiring 

Porch Plumbing 

Roof Heating system 

Sidewalk Other 

Driveway 

Garage 

Other 



COLLECTION OF DATA DIRECT SOURCES 97 

Wording of the Questions. When schedules in the hands of agents 
are used as in the real estate survey of the preceding section, it is not 
necessary to word questions in so much detail as in a mail question- 
naire. Since the agents are already familiar with the meaning of each 
question and the definition of terms, the abbreviated form used in 
Figure 3 is better than the sentence form of questions in Figure 2. 
It is easier for the agent to check the answer wherever possible than 
to write several words, and the uniform marking greatly facilitates 
tabulation. Where there may be a variety of answers a space is left 
for writing. For example, the answer to number 6 would be "whole 
house" for a single family dwelling. 

The wording of a mail questionnaire is analogous to the agent's 
conversation with the informants. Since the questionnaire must be 
filled out by the respondent himself, the questions must be complete 
sentences and must make their own appeal. Certain practices in word- 
ing questions have been found so effective as to have almost the force 
of rules. 

The ivordmg must be clear to the respondent: Each question should 
contain but one idea. It must be stated as simply as possible so that 
there can be no doubt in the mind of the respondent what is wanted. 
Care should be exercised also to avoid the possibility of an ambiguous 
answer. For example, the following questions and answers are taken 
from an investigation made some years ago of the status of the Negro 
in industry. 

Common labor 
In what jobs are both Negroes and whites employed ? 

Common labor 
In what jobs are only Negroes employed? 



Foremen and Mechanics 
In what jobs are only whites employed? 



These questions appear to be simple and straightforward. The inves- 
tigator felt that there could be no doubt that they would be clear to 
employers to whom the questionnaire was sent. Yet impossible answers 
such as those recorded for questions 1 and 2 were received in a large 
number of the replies. Apparently the person filling out the ques- 
tionnaire disregarded the word "only" in the second question. The 
maker of a questionnaire cannot expect that respondents will read 
questions as discriminatingly as was required in this case. 



98 



The work of the respondent must be kept to a minimum: There are 
several things to consider in complying with this rule. The use of a few 
easily answered questions in a questionnaire will increase the per cent 
of replies. If the answers can be given in a few minutes, the respon- 
dent is likely to fill them in immediately, whereas a list requiring more 
time may be laid aside and never picked up again. The number of 
replies is increased by the use of questions answered by "yes" or "no," 
or with easily obtained numerical answers or with a list of colors, 
qualities, places, etc., from which the respondent can check or under- 
score the applicable ones / 

The respondent should not be asked to make computations. Hence 
the question, "What is your annual remuneration?" is not a good one 
to ask a laborer for two reasons. (1) Not only is the word "remunera- 
tion" foreign to his vocabulary, but (2) he may be unable to state his 
earnings except by the day or week. ; Requests for past information 
should be avoided if possible. The United States Department of Agri- 
culture was not likely to obtain much usable information from the 
following request sent to farmers September 15, 1930: 

1929 1928 1927 1926 

1. Acres sown to wheat in the summer or 

fall of each year 



1930 1929 1928 1927 

2. Acres sown to wheat in the spring of 

each year 

3. Acres of wheat harvested in the sum- 
mer of each year 

11. Actual cost of storing in local station 
elevator, per bushel, per month in each 
year 

Make sure that no unnecessary repetition of information is re- 
quested. Two adverse results arise from failure to heed this warning: 
(1) The duplication adds to the length of the questionnaire and may 
be the cause of its being discarded. (2) The impression in the mind 
of the respondent created by the duplication is likely to be hostile to 
the point of causing him to discard the questionnaire.! Examples of 
repetitious and overlapping questions are found in the following list 
selected from a questionnaire sent to state hospitals by a social agency: 

6. To what extent does overcrowding express itself in unsuitable sleeping 

quarters ? 

11. Do you need additional employees? 
15. Do you have adequate hospital and medical facilities? 



COLLECTION OF DATA DIRECT SOURCES 99 

19. Do you have adequate facilities for giving your inmates instructive 
work and recreation? 

21. Are your facilities for academic work sufficient? 

22. (c) Is your staff of teachers large enough? 

28. Are inmates paroled whom you deem unfit to return to the community? 

29. What are the outstanding needs of your institution? 

Bed capacity Medical equipment 

Employees Academic equipment 

Teachers Recreational facilities 

Opportunities for work Extended parole 

The last question merely asks for information already covered by the 
preceding questions. Questions 15 and 19 each combine two separate 
ideas. There are other faults in the wording of these questions which 
will be referred to later. 

Form and content must not be offensive: A great amount of per- 
sonal and business information can be obtained by the use of ques- 
tionnaires, but great care must be exercised to avoid offense. One 
cannot ask the question, "What was the dollar value of your net sales 
last year?" But the approximate data may be secured by asking, 
"Please indicate in which of the broad groups below your net sales 
for last year would fall," followed by several sales classes arranged 
to give enough detail for use in the subsequent steps of the investiga- 
tion. A question may not be personally offensive, but may involve 
official complications.' In the example quoted in the preceding section, 
a hospital superintendent might well hesitate to answer question 28, 
fearing to give offense to the parole board and to politicians. A ques- 
tion stating or even implying moral turpitude should be avoided. 
Likewise questions dealing with religious principles or habits should 
be used with caution. 

Bias must be avoided: Bias may enter in two ways. First, the ques- 
tion may be phrased so as to lead to a certain answer. An example of 
a biased wording is, M Did the fish cakes taste better to you than canned 
salmon, salted cod, or shredded cod?" It would be much better to 
list the four types of prepared fish and request that the user number 
them in the order of preference. 

Second, estimates that are based on opinions rather than on actual 
figures may be biased. Suppose you were inquiring of a manufacturer 
of drugs whether his product was distributed at retail mainly through 
chain stores or independent stores. His direct contacts with the buyers 
of chain retailers might lead him to suppose that they were his chief 



100 BUSINESS STATISTICS 

customers, whereas a study of the sales records might well show the 
reverse. 

Answers should be obtained in the most usable form: Tfcfe is 
essentially a matter of visualizing the subsequent use of the data. In 
particular all units should be carefully selected and defined from the 
point of view of the subsequent analysis. When new information is 
to be used in conjunction with some already in hand be sure that the 
new information will be comparable with the old. 

It is also essential that the information be received in a form which 
facilitates tabulation and analysis. In most questionnaires having more 
than two or three questions cross-information becomes available by 
using the results of two or more questions together. It is important 
to plan the questions so as to develop the maximum amount of cross- 
information. Conversely, failure to consider this feature of the ques- 
tionnaire may lead to a serious hiatus in the information which will 
be discovered too late to be remedied. Figure 4 shows the advantages 
of foreseeing the subsequent parts of the work when making the 
questionnaire. 

The three blank spaces marked "Leave blank for Department use" 
permit the tabulation on this form of the number of spindles in the 
mill, the number being currently operated, and the number belted and 
ready to operate (active spindles as defined in the industry). Infor- 
mation is also available from which to compile an age distribution of 
total spindles and active spindles in the mill. Likewise a distribution 
by kind of spindle can be made for total spindles and active spindles 
and these can be further classified by age, if desired. Thus we see 
the amount of information that can be taken from a carefully prepared 
questionnaire such as this one. All of the tabulation forms were drawn 
up at the same time the questionnaire was prepared and the two were 
made to conform at every point. 

Definitions. In preparing a schedule or questionnaire, any word, 
phrase, or technical term which may lead to variation of interpretation 
should be defined. The units in which the data are to be collected 
must also be defined. These definitions are equally important in the 
case of schedules and questionnaires, and the necessity for preliminary 
decisions regarding the terms to be used is the same in either case. 
However, it is evident from the rules regarding questionnaire con- 
struction that if a great amount of detailed definition proves necessary 
the use of the mail questionnaire is to be avoided. Whatever defini- 



COLLECTION OF DATA DIRECT SOURCES 



"Js-a 

^J!' s 

1.? 8 -s 

1HK 



v) x-s J5 



y 



.-j o V 

' " 

*s .s - 

'a, S^ 

v) i o/; 

g 1 1 
I 



rr 
2) 

*^ 



M 



NG 



Q 

S 



"2 s^ y a 

J .S|-f I 

O feO "> Q /^ 

*"*"* P 



00 
S 

6 







.3 
S 


1 



a _H fr S -* S g 



3 



a ? 

.o o 

^ .^ ' 
6 S' 



tj 



53 vj ;j w< ^ 

T5 a'a | 



li 



s - * 



Sv 5 6 



** 



S* 



g 



Sltfs 

^3 P 3 



Check kin 
of spindle 
below 



I I I 



1* 

x a 



6 
c 



101 



102 BUSINESS STATISTICS 

tions are essential in a questionnaire should be printed as close as 

possible to the questions to which they apply. The definitions usually 
do not appear on a schedule but should be printed along witlfthe 
general instructions to the agents.' 

Terms: For either method of investigation, the definitions must 
be inclusive of all the limitations that have been placed upon the col- 
lection process. They must be so precisely worded that (1) no am- 
biguity of terms exists; (2) no limitations of terms are left indefinite; 
and (3) no technical uses of terms are unexplained. Some examples 
will show the necessity for careful wording and definition*, 

The treasurer of a department store submitted a list of questions 
to the department heads of the store. One of them read, "Have you 
been successful recently with promotions?" There may be no doubt 
as to what is wanted here, but the simpler thing would be to specify 
sales promotions rather than promotions of the staff. 

Referring to the questionnaire sent to state hospital superintendents 
(p. 98), question 6 reads, "To what extent does overcrowding express 
itself in unsuitable sleeping quarters?" The phrase "to what extent" 
is indefinite. Such words as "unsuitable," "adequate," and "sufficient" 
as used in this questionnaire are meaningless unless related to definite 
standards. 

A questionnaire sent to college and university teachers contained 
this question, "What per cent of your regular salary goes for rent?" 
This question appears to be simple but the word "regular" is used in 
a technical sense. Presumably it means the compensation for teaching 
the usual number of hours per week for a nine months' period. This 
definition would exclude evening and extension school salary in some 
cases and include it in others. Summer school salary would not be 
considered "regular" even for a person who taught every summer. 
Many other variations exist in different colleges. The word "regular" 
requires exact definition if the results obtained from the questionnaire 
are to be comparable. 

Units: Every kind of recording of numerical information requires 
that a unit be established in which to perform the process of enumera- 
tion.' The unit may be a person, an animal, an inanimate object such 
as a tree or a house, a measured quantity such as a ton or a bushel, 
a money measure such as the dollar or the franc, an abstract concept 
such as an order, an accident, or a vacation. In some cases the type of 
enumeration to be made immediately determines the unit to be used, 



COLLECTION OF DATA DIRECT SOURCES 103 

as in counting population or recording sales. In other cases a choice 
of units is available as in recording production of cement in which the 
count could be made in tons, barrels, or dollars of value. In every case 
in which the unit is not obvious from the nature of the enumeration, 
selection of a unit must precede the counting process. 

Once the unit has been selected consideration must be given to the 
question of whether any uncertainty may arise in its use. In many cases 
definition will be required to avoid ambiguity; thus in collecting infor- 
mation concerning the size of houses careful definition of what to 
count as rooms is necessary. Similarly, in recording industrial accidents, 
a careful statement must be made of what sorts of injuries are to be 
included as accidents./ 

Units can be divided into two kinds: (1) those with definition 
established by law or custom, and (2) those for which the definition 
must be established separately wherever they are usedJ \Examples of 
the first kind are the bushel, the gallon, the yard, the hour, etc. Each 
of these measures carries a standard definition which serves as an 
adequate description any time it is employed as a unit. 1 The unit for 
measuring wheat is the bushel, and no further definition is required. 
On the other hand when the unit used is a ship, a room, a voter, or 
a horse it is necessary to explain what shall be counted and what shall 
be omitted during the enumeration. Thus a room in a dwelling is 
not a usable unit until many borderline cases such as closets, breakfast 
nooks, pantries, and sun-rooms have been either included or excluded 
by definition. If a subsequent investigation is made using a different 
definition of a room, the results of the two investigations cannot be 
compared although both use the unit "room" as the basis for counting. 

These two types of units are usually known as measurement units 
(fixed definition) and counting units (variable definition). To call 
the latter type counting units seems somewhat ambiguous because the 
process of enumeration involves counting regardless of the type of 
unit used. For that reason we prefer the distinction based on the 
amount of definition required^ 

Variable definition units such as a "room" exist independently of 
the counting process, a fact which explains the need for definition 
each time they are used. As separately existing entities they possess 
individual differences which inevitably run to limits where it is neces- 
sary finally to establish a boundary. The same situation does not arise 
when a unit such as the pound or mile is used. 



104 BUSINESS STATISTICS 

Definition is required to different degrees in the realm of variable 
definition units. A "person" needs little or no definition because the 
unit is so universally recognized. Likewise the unit "citizen," a part 
of the universe of persons, has a fixed legal definition in each country. 
On the other hand such a unit as a "salesman" or a "criminal," each 
a part of the universe of persons, requires very careful definition. Only 
a few units are so well recognized as the person or citizen, hence the 
general conclusion that one must always be prepared to state variable 
definitions completely. An example will illustrate what is likely to 
occur in particular cases. 

The instructions accompanying a schedule included this statement: 
"Information to be secured for wage-earners only." Trouble arose 
continually in determining exactly who were wage-earners. The gen- 
eral concept "wage-earners" was perfectly clear, but the definition of 
the statistical unit a "wage-earner" was extremely difficult. On the 
face of it a wage-earner should be one who receives compensation from 
others for services rendered. But questions such as the following were 
brought in by the agents daily: "How about a physician who received 
a salary instead of fees?" "How about a daughter living at home and 
working for her father at a nominal salary?" "How about an insurance 
agent working on commission and receiving a fixed percentage of the 
annual profit?" As soon as these questions were answered, others 
arose. The answers given to such questions as these depend upon the 
purpose of the investigation. But no matter how carefully any such 
unit as "wage-earner" is defined borderline cases will arise which will 
have to be settled arbitrarily by the director. 

Form. General: The primary consideration that determines the 
form of a schedule is convenience, whereas the first requisite of a mail 
questionnaire is good appearance. In either case the size, shape, mate- 
rial, and type of printing are important. Whenever possible cardboard 
rather than ordinary paper is to be preferred. Cards are easier to 
handle in an interview and more durable for editing and tabulating. 
If cards are not feasible then a good quality paper should be used. 
One sheet of questions is always preferable. However, if the choice 
is between overcrowding of questions and the use of an additional 
sheet, the latter is the lesser of two evils. Overcrowded questions are 
likely to result in misplaced answers, incomplete answers, and increased 
difficulty of tabulation. 

The first impression made by a mail questionnaire will determine to 



COLLECTION OF DATA DIRECT SOURCES 105 

a large extent whether it will be answered or discarded. A closely 
printed sheet or card immediately gives the impression of being lengthy 
and time-consuming. This can be partially overcome by well-spaced 
questions, the use of rulings, and variations in type sizes. 

Sequence of questions: The questions should be arranged so that 
they will form a natural sequence for the respondent. Although the 
sequence must be varied to meet the requirements of different inves- 
tigations and will not be identical for schedules and mail questionnaires 
some general principles can be stated. 

1. The initial question or questions should be simple. 

2. Any preliminary questions which are necessary to pave the way 
for the key questions should come next. 

3. The key questions, that is, those which relate to the major pur- 
pose of the investigation, should be placed at the end or near the end. 

4. If one or more questions calling for an opinion rather than a 
statement of fact are included in the schedule, they are usually placed 
at the end. 

'Figure 5 taken from the questionnaire used early in 1935 in a 
market survey of college men employs a good sequence with one 
exception. 

FIGURE 5 
RADIO SKCTION OF QUESTIONNAIRE USED IN SURVEYING TUB COLLEGE MARKET 

1. Do you have a radio in your college room? Yes No 

2. What make? 

3. When bought? 4. How many tubes? 

5. Where bought? College town Outside college town 

6. Do you intend to purchase a radio before 1936? Yes No 

7. If so, what make? 8. About what price? 



Questions 1, 2, and 4 provide the facts concerning the student's 
present radio. The questions are simple and the information is im- 
mediately available. Questions 3 and 4 should exchange places. 
Questions 3 and 5 give preliminary information leading up to question 
6. Question 3 was presumably intended to give information concern- 
ing the age of the radio. The answer will, of course, be misleading if 
the student bought the radio second hand. Question 6 is the key 
question of the schedule. It gives the information concerning the 
potential market for radios among college men during the current year. 



106 BUSINESS STATISTICS 

Questions 7 and 8 correctly follow question 6 since they ask for infor- 
mation supplementary to it. 

Auxiliary material: In addition to the list of questions certain ex- 
planatory material must be prepared. Most important is a letter of 
transmittal to accompany a questionnaire or instructions to agents 
who collect schedules. The purpose of the letter of transmittal is to 
engage the attention of the addressee and encourage him to respond. 
The instructions to agents provide a background of information which 
will permit them to accomplish the same result by personal interview. 

Letter of Transmittal with a Questionnaire Motives: In a mail 
questionnaire the questions themselves may be preceded on the same 
sheet by a brief explanation of the purpose of the investigation and 
the reasons for requesting information from the particular persons 
to whom the questionnaire is sent. The usual method, however, is to 
inclose a letter of transmittal explaining the questionnaire and pointing 
out some incentive for answering. When sent to business men it is also 
customary to include a duplicate copy of the questionnaire for the 
respondent's own files. 

Some of the motives to which an appeal for voluntary replies can 
be made are: (l) co-operation; (2) interest; (3) profit; (4) obliga- 
tion; and (5) position. Figures 6-10 illustrate from actual examples 
how appeals can be made to different motives. 

FIGURE 6 
CO-OPERATION 

DEAR SIR: 

Will you be kind enough to take a few moments of your time to 
jot down the answers to the questions on the back of this letter? 

We would greatly appreciate this favor. You will not be asked to 
buy anything your name will not be used at all. 

This letter is one of a small number we are sending out at the 
request of a large manufacturer to get the viewpoint at first hand from some 
of his customers. 

It is not necessary to write a letter. Just check or fill in your answers 
in the space provided and mail the sheet back to us in the enclosed stamped 
envelope. 

Many thanks for your help. 

Very truly yours, 

[An Advertising Agency] 



COLLECTION OF DATA DIRECT SOURCES 107 

FIGURE 7 
INTEREST 
DEAR FELLOW-MEMBER: 

The subject of calendar reform has been studied during 1933 by a 
committee of the American Statistical Association, and its majority and minority 
reports appear in the supplement to the Association's Journal for March 1934. 
Before taking action, the Board of Directors wished to pursue the subject 
further, and has appointed the present Committee, with instructions to ascertain 
the considered opinion of the Association's membership on the question of 
calendar reform. 

The problem presented by the defects of the present calendar is 
obviously of importance to the statistical profession, a number of whose mem- 
bers deal with the analysis of time series. 

It is the purpose of this Committee to obtain as far as possible 
the considered opinion of the whole membership. 

To aid in this undertaking, will you kindly fill out the questionnaire 
and return it to the Committee at the earliest possible date. 

Yours very truly, 

[A Committee Chairman of the American 
Statistical Association] 

FIGURE 8 

PROFIT 
DEAR SIR: 

In order to assist farmers in adjusting the production of milk and 
dairy products to prospective demand, the U. S. Department of Agriculture is 
now undertaking to collect more complete information regarding the number 
of cows being milked, the quantities of milk and cream being produced and 
sold and such information regarding the number of heifers being raised, the 
number of cows coming fresh, the quantity of grain being fed and dairymen's 
plans for the future as may be needed to find what changes in production 
may be expected. 

Those who cooperate with the Department by returning each month 
a report for the herd which they own or operate will receive copies of the 
reports. 

On the other side of this page you will find some questions regard- 
ing the quantity of milk now being produced on your farm, the last price 
received, and the quantity of grain being fed. In return for your assistance 
I am enclosing a summary of the outlook for dairying so far as this Department 
has been able to determine the outlook from such information as is now 
available. 

Yours very truly, 

[A Division Chief of the United States 
Department of Agriculture] 



108 BUSINESS STATISTICS 

FIGURE 9 

OBLIGATION 
DEAR SIR: 

Those industries which today are making progress toward stabiliza- 
tion know their capacity as well as their demand. To bring some light to the 
capacity situation in worsted spinning, the Research Department reported briefly 
to the trade in December on post-war trends in worsted sales yarn spindles. 
Lack of figures at the time of that release prevented a detailed analysis of 
current worsted spindlage. We are now ready for that step an industry- 
wide inventory of all worsted spinning spindles in the textile mills of this 
country as of March 1st, 1931. Leaders of your industry have approved this 
survey. Typical of their attitude is that expressed in a recent letter received 
from Mr. , a copy of which we are glad to submit at this time. 

You are well aware, of course, that a survey of this nature is only 
successful to the degree that all firms in the industry respond. In a word the 
value of this survey to you is very directly tied up with the number of firms 
in the industry who supply the information outlined on the enclosed schedule. 
An early reply on your part will help insure an early report of the results by 
this Department. Please do not hesitate to bring to our attention any problems 
that may arise in filling out this schedule. We will gladly assist you in any 
way possible. 

Sincerely yours, 

[A Director of a Research Bureau] 

FIGURE 10 
POSITION 

FELLOW-ECONOMIST : 

In order to help toward straight thinking on the prohibition ques- 
tion, will you please fill in the accompanying questionnaire to the best of your 
ability? 

Please note that many of the answers must represent opinions, but 
that your unbiased judgment as an economist is desired. 

I assure you that your name will not be mentioned in any way 
unless you give permission. 

Please do it now, using the enclosed stamped envelope for mailing 
your reply. 

Sincerely yours, 

[A University Professor] 

Distinct from questionnaires which depend upon making an appeal 
to some voluntary motive are government requests for information to 
which answers are compulsory. The element of compulsion is pre- 
dominant in such a letter, as Figure 11 will show. 



COLLECTION OF DATA DIRECT SOURCES 109 

FIGURE 11 

COMPULSION 
SST-1564-CLH 

TREASURY DEPARTMENT 

Office of the Collector of Internal Revenue 

Pittsburgh, Pa. 

July 29th 
1937 

B Company 

M , Pa. 

The records of this office indicate that you filed an application on 
Form SS-4 for an "Employer's Identification Number" under the provisions 
of Title VIII of the Social Security Act and that you were assigned the Iden- 
tification Number above indicated. This Act requires every Employer to file a 
return for each month and pay the tax shown to be due, effective from 
January 1, 1937. 

The records of this office indicate that you have not complied with 
the law in this respect by reason of your failure to file a return for the months 
of January, February, March, April, May, and June, 1937, inclusive. 

You are, therefore, requested to file a separate return on the blank 
forms inclosed for each month above mentioned and to forward the same to 
this office with your remittance for the tax due. An affidavit in explanation 
of your failure to file the returns within the time prescribed by law must 
accompany the returns for the consideration of the Bureau in connection with 
the assertion of the delinquency penalties. (Blank affidavit inclosed.) 

In preparing these returns, your complete Name, Address and your 
Identification Number must be shown thereon as indicated at the top of this 
letter. If, for any reason, you are not subject to the provisions of this Act, 
please advise this office fully, or, if a return was filed by you in another District, 
advise the date and the place where filed, also the serial number stamped 
upon your cancelled check. 

Reply should be made within ten (10) days from the date of 
this notice. 

Very truly yours, 
[An official in the Internal Revenue Office] 

Sometimes several motives for securing replies will be combined. 
The question of what type of appeal to use will depend upon the par- 
ticular circumstances involved. Considerable care should be exercised in 
writing these letters because they must accomplish in an investigation 
by mail what an agent does by personally interviewing prospective 
informants when schedules are used. 



110 BUSINESS STATISTICS 

Instructions to Agents: There are two parts of the instructions to 
agents: (1) the definitions of terms and units used in the schedule 
and (2) the general instructions. The definitions were discussed on 
page 100. An illustration of what should be included under general 
instructions is provided in Figure 12, which was prepared for a study 
of home ownership in Buffalo, New York. 

FIGURE 12 

INSTRUCTIONS TO COLLECTING AGENTS 

PRELIMINARY 

You have in your possession a letter identifying you as an agent of 
the President's Conference on Home Building and Home Ownership. Use this 
identification discretely, remembering always that you must not use compulsion 
in seeking replies; on the other hand, you do have back of you the authority 
of a government investigation which should instil confidence and insure to the 
informant that all information given will be treated as absolutely confidential. 
In the use and publication of results no individual information will ever be 
divulged. 

No identification appears on the schedule except the case number. On 
the separate sheet which is provided for the purpose keep an accurate record 
of the exact address from which each case is taken. Agents must be doubly 
careful to keep this special record accurate or the editing of the questionnaires 
may be seriously handicapped. 

This study is confined to families having total income (earned or other) 
not exceeding $3000. You will be unable to determine exact income be- 
fore reaching page 3 of Form 1. A preliminary question, however, should 
determine the availability of the family. If exact tabulation of page 3 should 
show total income slightly in excess of $3000, the schedule will be used. 

Information is to be collected only from families composed of a minimum 
of husband, wife, and one dependent child. 

Where you find more than one family occupying quarters which are quite 
dearly intended for a single family, do not fill out Form 1 but secure the 
information on Form 2. 

Information is to be collected only from families who were purchasing 
homes during 1930, but who began paying for their homes prior to January 1, 
1930. 

Information is to be collected only from families in which both parents 
are native bom whites. 

The most inexcusable errors in compiling data are those arising from 
carelessness on the part of collecting agents. Therefore, 



COLLECTION OF DATA DIRECT SOURCES 111 

(a) Write legibly; be neat. 

(b) Be sure that you understand all questions and instructions. 

(f) Before dosing your interview, check to avoid omitting any part of 
of your schedule. 

Selection and Training of Staff 

Types of Workers. The number and types of workers needed de- 
pend entirely upon the character of the investigation. If it is a ques- 
tionnaire study, the problem of organizing a clerical staff for the 
preliminary work is in no way peculiar to a statistical inquiry. How- 
ever, when agents are to be used the selection of personnel presents 
a more specialized problem. In both types of investigation, editors 
and a staff of statistical clerks will be needed as soon as collection is 
under way. The selection and training of all these workers becomes a 
part of the process of conducting an investigation, particularly if the 
only available staff consists of students or other inexperienced workers. 

Qualifications and Training. tit is essential not only that each of 
these workers receives instruction as to his own specific duties, but also 
that each acquires a thorough understanding of the entire process. 1 For 
example, if the agents receive some training in the method of tabula- 
tion, they will realize why they are asked to make entries in a uniform 
manner. If the editors have even a slight experience in collecting the 
schedules, they understand the difficulty of getting exact information 
and will be more likely to offer their criticisms to the agents without 
arousing antagonism. 'Testing of the various staff members in different 
parts of the work has the additional advantage of discovering which 
ones are best adapted for editing and which ones make the best agents. 

Some individuals may not be successful in making personal contacts 
and getting information from other people but may have an eye for 
detail and a capacity for detecting errors. The latter qualities are de- 
sirable in an editor, but even more important is the ability to appraise 
a schedule as a whole for consistency and validity. This is especially 
true when long and complicated schedules as in a cost-of-living inquiry 
are being collected. A schedule may balance perfectly having no 
specific errors of any kind and yet contain gross inconsistencies or 
omissions that can be detected by an editor with common sense: For 
example, in a cost-of-living investigation good editing would imme- 
diately question a schedule in which some income was reported 
from insurance benefits resulting from the death of a member of the 



112 BUSINESS STATISTICS 

immediate family, but where no item for funeral expenses appeared 
under "expenditure." 

Agents become the direct representatives of the organization con- 
ducting the inquiry in making contacts with those from whom the data 
are to be obtained. Upon their shoulders rests to a considerable degree 
the responsibility for the success of the undertaking. A good agent 
must have "tact" defined as "intuitive perception, a ready apprecia- 
tion of the proper thing to say or do, especially a fine sense of how to 
avoid giving offense." In other words he must be a good salesman in 
order to "sell" to the informant the idea of answering the questions. 

Training in certain techniques must be added to this natural capacity 
before the agent becomes a qualified field representative. First, he 
must thoroughly understand the general purpose of the investigation 
and believe in it himself. He should be able to explain it and con- 
vince the informant of its validity without the necessity of referring 
to his letter of credentials. Agents must always be furnished with 
such credentials for their own protection, but an official letter prac- 
tically never has a persuasive effect on an irate informant even though 
the investigation is being conducted under government authorization. 
Next, the agent must be so familiar with the schedule and all of the 
instructions, definitions, and limitations that he can conduct the inter- 
view and complete the schedule without any hesitation or reference 
to notes. / He is never permitted to alter the meaning of a question or 
definition and if in doubt on any point he should add full notes de- 
scribing the situation. If an unusual situation not contemplated by 
those who planned the investigation should arise, a well-trained agent 
should be prepared for such contingencies. His function is to secure 
complete information on the unusual case so that final disposal can 
be made by the person in charge of the investigation. In such cases it 
may be advisable not to complete the interview, but to leave the way 
open for a return visit after having consulted the director for advice. 

All of the information which the agent secures, whether written on 
the schedules or given orally in an interview, is completely confidential. 
Inexperienced agents sometimes forget that they are not at liberty to 
discuss collected information with anyone, even with fellow-agents 
and much less with friends. Failure to observe this rule can be ex- 
tremely embarrassing, if information that was given in confidence 
comes back to its origin through third parties, and may have even 
more serious consequences. 



COLLECTION OF DATA DIRECT SOURCES 113 

Example of Agent Technique. The following example shows the 
effect of proper and improper agent technique: 

Several years ago an investigation of housing conditions was being 
made in a large city. Although the investigation was sponsored by 
a committee which had been appointed by the President of the United 
States, it was not official nor was anyone compelled to give informa- 
tion to the agents. In spite of the fact that the status of the investiga- 
tion had been explained fully, one of the agents insisted upon answers 
to his questions to a point which caused an irate housewife to call the 
police station. The agent was picked up by a policeman on the house- 
wife's complaint, taken to the station house, and subsequently to the 
office of the police commissioner. At this point the director was called 
to the commissioner's office. The director's explanation convinced the 
commissioner that no crime had been committed, but did not convince 
him that agents should be permitted to annoy housewives any further. 
Fortuitous circumstances entered the case at this point. A detective 
who had been assigned to the case was called in by the commissioner. 
Quite unexpectedly the detective reported that another agent had ap- 
peared at the door of his home the previous evening, that the agent 
had been courteous, his questions inoffensive, and that the detective 
had been entirely willing to give the information requested. The 
commissioner then consented to have the investigation continue pro- 
vided the offending agent were dismissed. 

Perhaps it is superfluous even to point out the value of the second 
agent's work. By the proper approach this man had secured the in- 
formation that he wanted, had created good-will for himself, and 
quite unwittingly had saved the entire investigation. In addition to 
showing how agents may succeed or fail, the example points to the 
desirability of notifying police authorities before sending agents out 
to do house-to-house investigation. 

Supervising the Work 

The foregoing example indicates the necessity for constant super- 
vision on the part of the director while the investigation is in progress. 
He must be available at all times so that any unusual situations can 
be met as they arise. On the other hand every detail of the routine 
plan should be a matter of written record, and each part of it should 
be thoroughly understood by at least one other staff member, so that 
the orderly progress of each step is automatic regardless of the presence 



114 BUSINESS STATISTICS 

or absence of any one person. These routine steps include: the check- 
ing of the adequacy of the sample as the collection proceeds; a regular 
system for making assignments, accounting for returns and routing 
of schedules to agents, to editors, back to agents if necessary, and 
finally to tabulator; arrangement for check interviews by visit or 
telephone as a test of each agent's ability and integrity; adherence to 
the quota and time schedule originally planned for the investigation; 
issuance of additional or revised instructions to the entire staff when- 
ever necessary; and provision for staff meetings at regular intervals 
for the discussion of difficult points that may arise. / 



SUMMARY 

In the conduct of any specific investigation numerous details arise 
that cannot be discussed in a general textbook. No attempt has been 
made in this chapter to furnish a complete guide for a person under- 
taking a statistical investigation in any particular field. Many books * 
devoted solely to the description of research techniques can be con- 
sulted to supplement the statement of principles and methods 
presented here. 

PROBLEMS 

1. Assuming that information on the following subjects is to be obtained 
by direct collection, which of the three types of sources listed in the text 
should be used in each case? 

a) The brands of bread used by families in a city. 

b) The distribution of employed persons in a city according to a classi- 
fied list of occupations. 

f) The extent of use of different types of anti-freeze solution in auto- 
mobile radiators. 

d) The extent of unemployment of union labor and non-union labor. 

e) The tendency toward the construction of lower-cost houses in urban 
centers. 

/) The distribution of vacant dwellings in a city according to rent level. 

g) The effect on sales of milk in a community of an increase of 7 per cent 
in the retail price. 

2. What is the purpose of preliminary testing before starting a direct investi- 
gation ? 



1 A few of these are listed at the end of the chapter. 



COLLECTION OF DATA DIRECT SOURCES 115 

3. Explain the difference in wording of questions in a schedule and a ques- 
tionnaire. 

4. Explain which of the following alternative wordings is preferable for a 
questionnaire and why: 

a) (1) What color do you prefer for your next automobile? 

(2) Check the color you prefer for your next automobile: 

green maroon 

blue brown 

grey gun metal 

black other (specify) 

b) (1) Do you consider the advertising statements of local stores more 

reliable than statements found in magazine advertising? More 

Less 

Do you consider the statements made in advertising over the radio 

more reliable than those found in newspapers? Yes No 

Do you feel that a statement in an advertisement is more reliable 
than a statement by a clerk in the store? Yes No 

(2) Mark in the order you consider them dependable the following 
media of information concerning consumers' goods (mark the 
most dependable 1, etc.) 

magazine advertising 

radio advertising 

newspaper advertising 

statements of clerks in stores 

c) (1) Do any of the following apply to your concern? (Check which.) 

too many salesmen 

sales management inefficient 

sales territory poorly allocated 

sales commissions too large 

(2) Which of the following would be most effective in reducing 
selling expenses in your concern? (Check one.) 

reduction of selling force 

reorganization of sales management 

reallocation of sales territory 

reduction of sales commissions 

5. Define the following terms for use in a schedule or questionnaire. Be sure 
to provide for possible borderline cases, a) a farm; b) a factory; c) an 
employed person; d) a department store; e) a radio news broadcast. 

6. Explain the difference between fixed definition units and variable definition 
units. Give three examples of each not taken from the text. Explain the 
need for definition in each of your examples of a variable definition unit 



116 BUSINESS STATISTICS 

7. Write a letter to accompany the radio questionnaire of Figure 5, page 105. 

8. The following questionnaire was sent to the subscribers of a magazine by 
the management of the magazine. Write a letter to accompany the ques- 
tionnaire. 

Name Age 

Address City 

Occupation 

Name of Company Position 

Are you the head of a family? Number in family 

Do you own an automobile? Make Year 

Do you own your home? Number of Rooms 

What are your hobbies? 

Where do you spend your vacations ? 

Do you own a radio? Make 

Have you a telephone? 

Suggestions .. . .. .. .. ...... . 

OO^**" v "*' - -------------------.------.-. 



9. Summarize the qualifications of a collecting agent. 



REFERENCES 

BOWI.EY. ARTHUR L., Elements of Statistics. London: P. S. King and Son, Ltd., 
1920 (fourth edition). 

Chapter III presents rules for the preparation of schedules and several 
examples of direct collection. 

BROWN, LYNDON O., Market Research and Analysis. New York: The Ronald 
Press Co., 1937. 

Chapter 9 contains an excellent statement of the rules to be observed 
in preparing a questionnaire. 

DAY, EDMUND E., Statistical Analysis. New York: The Macmillan Co., 1925. 
Statistical units are discussed on pages 17-23. 

EIGELBERNER, J., The Investigation of Business Problems. New York and 
Chicago: A. W. Shaw Co., 1926. 

Chapters IX, X, and XII deal with the mechanics of direct collection. 

SAUNDERS, ALTA G., and ANDERSON, CHESTER R., Business Reports. New 
York: McGraw-Hill Book Co., Inc., 1929. 

Chapters VIII and IX deal with the collection of data; emphasis on the 
preparation of questionnaires. 



COLLECTION OF DATA DIRECT SOURCES 117 

Standards of Research. Des Moines, Iowa: Meredith Publishing Co., 1929. 
Pages 20-26 contain an outline statement of the "Planning of Question- 
naires/' The use of agents is outlined on pages 40-43. 

The Technique of Marketing Research. Prepared for the American Marketing 
Society by the Committee on Marketing Research Technique. New York: 
McGraw-Hill Book Co., Inc., 1937. 

This entire book should be required reading for every person who expects 
to engage in marketing research. 



CHAPTER VII 
EDITING AND PRELIMINARY TABULATION 

TWO IMPORTANT steps, editing schedules and preliminary 
tabulation, follow the collection of data and precede the 
preparation of tables for presenting the collected information. 
Inadequate attention has sometimes been given to these processes be- 
cause they do not require the use of involved techniques. Nevertheless 
an understanding of the methods of editing schedules and of trans- 
ferring information from schedules to the initial tabular form is 
an essential part of the conduct of an investigation. The two processes 
are distinct, hence this chapter has been divided into two parts. 

EDITING SCHEDULES 

As agents' schedules and mail questionnaires are returned they must 
be studied very carefully in order to detect any irregularities in the 
responses. Experience demonstrates that this step is necessary whether 
the collection has been made by agents or by mail, although more 
questions will be answered incorrectly in mail questionnaires than 
in schedules collected by agents. Before any analysis is undertaken 
these errors must be detected by an editor and corrected if possible. 

The editor performs two functions: (1) detecting irregularities in 
the replies and (2) preparing the schedules 1 for tabulation. 

Editing for Irregularities 

There is no fixed order in which the editing should proceed. That 
is within the discretion of the editor. The following order will serve 
in many cases: (1) look for omissions, (2) verify check questions, 
(3) check for inconsistencies, (4) search for errors, and (5) check for 
uniformity between schedules. 

Look for Omissions. Each schedule should be complete. If the 
answers to any questions are missing an attempt should be made to 
get the information either by mail or by a second interview with the 

1 As used in this chapter the word "schedules" refers to collected information whether 
obtained by agents or through the use of mail questionnaires. 

118 



EDITING AND PRELIMINARY TABULATION 119 

informant. Failure to obtain the information by these means may 
cause the editor to mark that part of the schedule "no report" or if 
the missing information is primary, to discard the schedule. In the 
chain-store inquiry cited in the chapter on sampling (pp. 79-81), 
195 schedules were discarded entirely because primary information was 
missing and 50 other schedules were incomplete in some respect. 

Verify Check Questions. If the collection form includes answers to 
questions which should verify or check each other and these fail to 
check, the editor must search for collateral information that will indi- 
cate which of the responses is in error. For example, the age of a 
house may be stated as 22 years (in 1936), the date of construction as 
1920, and the initial mortgagee as a bank which was liquidated in 1916. 
The date of construction has apparently been given incorrectly, but 
the editor must not guess about this. If no collateral verification is 
possible, either the schedule must be returned to its author or the 
answers to these questions must be discarded. 

Check for Inconsistencies. There will often be questions the 
answers to which can occur only in certain combinations or in certain 
sequences. The editor must test these combinations and sequences for 
consistency. Replies to the following two questions sent in by a gaso- 
line and oil service station are inconsistent: 

What disposal do you make of bulk motor oil distributed to you on quota 
and unsold? (check which) 

Throw away 

Sell at lower price 

Return to agency for credit "! 

Mix with new oil received 

Allow to accumulate for waste use 

At what price per quart do you sell motor oil? 

Heavy body 31 cents 

Medium body 30 cents 

Light body ? 

Old stock left over ?..... 

(known in the trade as last year's oil) 

If last year's oil which is unsold is returned to the selling agency for 
credit, then the 20-cent selling price quoted has no meaning. Either 



I^U 1SUMJN1&& MAJL1M11& 

some old oil is sold or the price quotation on it should be removed. 
The editor must find out which answer is to be amended. 
,. Search for Errors. Any calculations which are on the schedule 
should be carefully checked. A tabulation of a total and its parts 
should also be verified. Errors which occur in numerical relations can 
usually be corrected by the editor. There are some cases, however, in 
which errors of this kind will require a resubmission of the schedule 
to the maker. 

Beyond these obvious things there may be others which can be de- 
tected only by a careful study of the answers to all of the questions. 
For example, a research bureau was receiving monthly reports of sales 
from a number of department stores. The sales of one store seemed to 
move opposite to the others in June. After this had happened for the 
third year, the director of the bureau grew suspicious. The difference 
might be the result of a special sale in June, or be due to the handling 
of special seasonal merchandise, but preliminary study of the case failed 
to disclose any reasonable cause for the difference. Finally a direct 
appeal to the co-operating store disclosed the fact that the controller, 
who regularly made out the monthly report, took his vacation in July 
and his substitute who made out the report of June sales misunderstood 
the schedule and reversed the May and June figures. 

Check for Uniformity between Schedules. The editor should check 
for uniform interpretation of all of the questions. He is quite likely 
to find that one or several questions have been misconstrued on some 
of the schedules. These things may not be evident in studying the 
schedules individually but may appear when one question is studied on 
all of the schedules. In an investigation of moving picture attendance 
by students, this question was asked: "How much did you spend for 
moving picture admission last week?" One of the student agents was 
noted for his erratic behavior and on this question he ran true to form. 
Many of his schedules showed an expenditure well above that of other 
schedules. Inquiry elicited from him the fact that he had probably 
asked for expenditures during the past month. All of his schedules 
were dropped from the investigation. 

Re-editing 

If the investigation is a large one or the schedule form is complex, 
two editors should go over the schedules independently. The second 
editor will find things which the first one overlooked. In fact the 



EDITING AND PRELIMINARY TABULATION 121 

schedules will always be somewhat less than perfect. As much time 
and money as possible should be used to improve them. Sometimes 
it will be better to have different editors check different parts of the 
schedule. This plan has the advantage of concentration but is weak 
in that no one person gets a comprehensive view of the schedules as 
a whole. 

The work of each editor should be distinguishable either by the use 
of different colored inks or other distinctive marking so that any cor- 
rections or alterations by an editor can be referred back to their author 
if necessary. It should be a fixed rule that editors do not erase; they 
cross out and substitute in all cases. 

Preparing the Schedules for Tabulation 

After the various irregularities have been adjusted, some steps still 
remain to be taken before the schedules are ready for tabulation. In the 
course of these adjustments changes will have been made on some 
schedules, unusual markings will appear on others. The editor should 
indicate specifically how such items are to be tabulated if there is any 
chance of subsequent misunderstanding. To facilitate the transfer of 
information from editor to tabulator, all final corrections should be 
made in ink of a certain color. 

Finally, the editor should indicate the proper classifications of items 
if they are to be tabulated in a form different from the way they appear 
on the schedule. For instance, if the question "What is your occupa- 
tion?" appears on the schedule and no check list accompanies it, the 
answers will appear in a variety of forms. The editor must mark these 
replies according to the occupational classification to be used in tabula- 
tion. Again, the schedules may show the state from which they come, 
whereas the tabulation is to be made by geographic areas. These areas 
should be marked by the editor. This sort of editing adds to the speecf 
and accuracy of the tabulation and makes it unnecessary to employ a 
highly skilled staff in the tabulation process. 

Sometimes the process of preparing the schedules for tabulation 
involves an intermediate step known as coding. The best example of 
this occurs when mechanical tabulation is employed. Coding of infor- 
mation to be transferred to "punch" cards becomes one of the most 
important steps in the mechanical process which is described in the 
second part of this chapter. 



122 BUSINESS STATISTICS 

PRELIMINARY TABULATION 

The methods to be used in transferring information from the col- 
lection forms to preliminary tables will depend upon the size of the 
investigation, the character of the data, and the ultimate form in which 
the results are wanted. The following four methods are available: 
(1) sorting-counting, (2) the use of a tally sheet, (3) the use of a 
work sheet, and (4) mechanical tabulation. 

Sorting-Counting 

The sorting-counting process can be used to advantage when the 
data to be tabulated are relatively simple so that each case can be put 
on one small card. The cards can be sorted and sub-sorted into piles 
according to any desired plan of classifying the data. The number of 
cards in each pile can then be counted and recorded on a tabular form 
prepared for the purpos^. 

The card used in an investigation of vacant dwellings in Buffalo, 
New York, has been reproduced as Figure 13. The purpose of the 
investigation was to discover the amount of vacancy in different sec- 
tions of the city, the extent of vacancy in single and multiple family 
dwellings, and whether more or less vacancy existed in buildings de- 
signed jointly for business and dwelling occupancy. The sampling 
method was used, every sixth census enumeration district being can- 
vassed. About 24,000 of the 140,000 families in the city were included 
in the study. A card was filled out for each dwelling place; hence the 
number of cards equalled the total number of places which either 
were or could have been occupied by a family. Thus for a three-family 
house three cards would have been turned in with the same address, 
each card recording the status of a single flat in the building. 

The cards were kept separate by enumeration districts. After they 
had been edited and the number of cards from each district recorded 
on a master sheet, the cards were ready to sort. They were first dis- 
tributed in five piles according to the number of dwelling places in 
the building. Each pile was then sorted into residential or combination 
residential and business. Each of these ten piles was sorted according 
to whether the dwelling place was occupied or vacant. The number of 
cards in each of the twenty piles was then counted and the results 
entered in the proper row of Table 12. 



EDITING AND PRELIMINARY TABULATION 123 

FIGURE 13 

COLLECTION CARD USED IN RESIDENTIAL VACANCY INVESTIGATION 
IN BUFFALO, NEW YORK 

Serial No 



Address 

Ward Tract Enumeration District ... 

No. of Dwelling Places in Building: 

One Two Three . 

Four Over Four (give number) 

Occupied Vacant 

Residential Combination . 

Agent 



The cards were then collected into a single pack, shuffled, and 
turned over to another tabulator to be sorted, counted, and entered in- 
dependently on a duplicate of Table 12. The cards were then held in 
distributed form until a third person checked the two records together 
and checked the total of the row with the number of dwelling places 
in that district as shown on the master sheet. If the two records agreed, 
and the totals checked with the original count, they were considered 
to be correct. If not, the cards were recounted until the discrepancy 
came to light. A similar procedure was followed in each enumeration 
district. As indicated in Table 12, the results were sub- totaled by 
tracts, if there was more than one enumeration district in a tract, and 
in every case by wards. This plan made it possible to use whichever 
of the geographic subdivisions was desired in subsequent work. 

Tally Sheet 

The use of a tally sheet is the reverse of sorting-counting in that 
the schedule cards or sheets are not separated into piles according to 
the various classifications. Instead a blank form, or several of them 
in a complex investigation, is made up to conform to the classifications 
of the data. The information is then scored on the blank form as it is 
read from the collection form. One person should read from the 



124 



BUSINESS STATISTICS 



a 

3 



o 

S 

I 


fa 

> 

1 

s 

H 

1 

i 
1 


DWELLING PLACES IN BUILDINGS ACCOMMODATING 


OVEK FOUK FAMILIES 


Combin- 
ation 


> 


oooo 





o oojo o oj I o 


II 
-2 

'0, 

1 

II 




o 


oooo 


o 


O OO|o OO| 





!J 


> 


o o o o 


o 


OJO O| 








oooo 


o 


000 


|000| 


o 

rH 


FOUK FAMILIES 


Combin- 
ation 


> 


oooo 





O|O O| 








oooo 





0|0 0| 





"(3 
8'g 


> 


oooo 


o 


o|o O O| 








o oxr o 


XT 


o o o|o ooj 





THREE FAMILIES 


Combin- 
ation 


> 


oooo 


o 


o o o|o o o| 





o 


ONO 


ON 


fO O *n|o O O| 


cA 


.i*^ 


> 


oooo 


O 


o o o|o o oj 





o 


cOXF C\O 


* 


o vo vo | to m vo | 


rH 


Two FAMILIES 


Combin- 
ation 


> 


oooo 


O 


o o o[o o o| 








O CM CM <N 

rH 


vo 

rH 


Xj O>fr|o CM CN|| 


vo 


i! 





oooo 





OlO 0| 





ONO Sfl 


vo 

NO 
fO 


o M CM 

xrxj- QO 


00 (N O, 

r- oo vo 


CM 

er> 


5 

M 

6 


Combin- 
ation 


> 


-<= 


- 


OrH ^| 0| 


- 





f*>" if\ *^ NO 


a 


-o-|o-"*l 





^ 


* 


rH CN O rH 


XT 


0|0 0| 








ON ro oo co 
vo NO CN ON 


jj 


ONXf r> 

rH r\ rx 

rH rH CNJ 


? 5sl 


o 

XT 


W*" 


rH 00 VO CM 

rH 




oo o rr\ir\ 

rH f\l rH rH 


1* 







V0| 00| 




|d 




- 




cs 



EDITING AND PRELIMINARY TABULATION 125 

schedules while another person records on the tally sheet. There is 
an advantage in having several persons record the information simul- 
taneously on separate sheets in order to secure one or more checks 
from the same reading. This, however, provides no check on the 
accuracy of the reading, so that perhaps the safest procedure is two 
independent readings and recordings. The weakness of the method lies 
in the fact that an error can be corrected only by rereading all the 
schedules. A device which partially overcomes this weakness is to 
divide the schedules into piles and then subdivide the tally sheet into 
corresponding parts. An error can then be localized in one of the piles 
and the rereading confined to that one. ' 

If this method were to be used in tabulating the information of 
the vacancy survey in Buffalo, the tally sheet would have the form 
shown in Figure 14. One person would read off first the number of 
dwelling places in the building, then whether the building was resi- 
dential or combined business and residential and then whether occupied 
or vacant. The person doing the tallying would locate the proper block 
as the information was read and would register one stroke for each 
dwelling place. The subtotals by enumeration districts, tracts, and 
wards would be recorded as indicated on the tally sheet. 

If there is too much cross-classification of the data, obviously this 
method becomes cumbersome. In that case it is probably better to 
abandon the tally sheet and use sorting-counting or the work-sheet 
method explained in the next section. The tallying process may be 
simplified, however, by sorting the schedules first into their major 
classifications and then tallying by subgroups. 1 In Figure 14 the cards 
would be separated according to enumeration districts before the tally- 
ing was done. 

The tally-sheet method is often the most desirable to use in taking 
information from a published source. For example, if one wished to 
record the number of industries in an area according to number of 
employees, the best method would be to make up a tally sheet with 
a classification by number of employees and tally the industries as they 
were read from the Census of Manufactures. 

The Work Sheet 

The purpose of a work sheet is to bring the information together 
in more convenient form than the schedules, so that it will be ready 
for further tabulation and analysis. After having been edited, the 



FIGURE 14 
TALLY SHEET FOR RECORDING RESIDENTIAL VACANCY IN BUFFALO, NEW YORK 



WARD 1 


TRACT 


2 




z 




4 




5 






ENUMERATION 
DISTRICT 


11 




8 




6 




2 






ONE 
FAM- 
ILY 


Residential 


O 


Ttu mi mi mi 
mi mi mi mi 
HU mi mi mi 
mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
iHi mi mi mi 
mi mi 


169 


mi mi mi mi 
TNI mi mi mi 
mi mi mi mi 
111 


63 


mi mi 
mi mi 
mi 111 


28 


mi mi 
mi mi 
mi mi 
mi mi 
mi mi 
mi mi 
mi mi 
mi mi 
mi mi 
111 


93 


353 


V 


i 


1 


u 


2 






i 


i 


4 


Combination 


O 


mi u 


7 


mi lui mi 


15 


1111 


4 


mi i 


6 


32 


V 


111 


3 


i 


1 


i 


i 






5 


Two 
FAM- 

ILY 


R 





mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
mi nu mi mi 
1111 


104 


mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
mi mi mi mi 
rni mi mi mi 
mi mi mi mi 


160 


mi mi 
mi mi 
mi mi 


30 


mi mi 
mi mi 
mi mi 
mi mi 
mi mi 
mi nu 
mi mi 
u 


72 


366 


V 




















C 









mi mi u 


12 


u 


2 


u 


2 


16 


V 




















THREE 
FAM- 

ILY 


R 


O 


111 


3 


mi mi mi mi 
1111 


24 


mi 1111 


9 






36 


V 




















C 









tHl 1111 


9 










9 


V 




















FOUR 
FAM- 

ILY 


R 













1111 


4 






4 


V 




















C 























V 




















OVER 
FOUR 
FAM. 

ILY 


R 























V 




















r 


O 




















V 





















EDITING AND PRELIMINARY TABULATION 127 

answers recorded on all the schedules are transferred to a single sheet 
or several sheets, depending on the size of the investigation. The 
headings of these sheets will correspond to the questions on the col- 
lection schedules. Thus any tabulation which can be obtained from 
the original collection forms can be taken equally well from the work 
sheet./ 

In chapter VI, page 105, a questionnaire was presented concerning 
the college market for radios. Figure 15 is a proposed work sheet 
for recording this information. One row of the work sheet is used 
for each questionnaire, thus the identity of the information is preserved. 

At least one sorting of the schedules can be made prior to recording 
any of the information on work sheets. In this case, the first obvious 
question on which to sort is whether the student now owns a radio. 
Second, from the non-owners, those who have not expressed the inten- 
tion of purchasing a radio during 1936 may be eliminated entirely. 
There is no information that needs to be tabulated regarding this 
latter group, except a count of the total number of such cases. The 
complete form of work sheet shown in Figure 15 is needed only for 
the schedules of present radio owners. The non-owners who expect to 
buy can be recorded on a much shorter form that includes only the last 
three sections which deal with expected purchases in 1936. This pro- 
cedure not only saves time in recording and space on the work sheets, 
but simplifies the task of classifying the information that will later 
be taken from these work sheets. 

Each section on the work sheet represents one question on the 
schedule and must include enough separate columns to accommodate 
every possible reply that is expected to that question. In transferring 
the information from the schedule, a check mark or other equivalent 
symbol is made in one column of each section. Thus the number of 
cases in any column can be totaled easily by counting the check marks 
in that column. The sum of the column subtotals in every section 
should all give the same result, which should be the total number of 
schedules being tabulated. 1 

In planning the headings of the sheet, it must be anticipated that 
for practically every question there are likely to be unexpected replies 
or unusual cases requiring notes of explanation, as well as instances 
where no answer appears.! It is necessary therefore to provide extra 
columns in nearly every section, such as those in Figure 15 marked 
"Notes," "Other" and "Don't Know." By checking "Don't Know," 



128 



BUSINESS STATISTICS 



FIGURE 15 
PROPOSED WORK SHEET FOR QUESTIONNAIRE USED IN COLLEGE MARKET INVESTIGATION, 
FIGURE 5, PAGE 105 

RADIO SECTION 


s 

* 

s 




3 












0\ 










00 










rx 










>o 










in 










^f 










a 










DATE BOUGHT OR BUILT 


s 
s 
& 










to 

0\ 










<* 
to 
o\ 










R 

0\ 










<^ 

ro 

o> 










ro 

0\ 












o 

O\ 










i8 

^2 










a 

M 


J 

O 










V u 

I'S 

wa 










O 
o 
o 


w 










u 










.2 

fe 
o 

M 

X 

1 


h 


1 






M 










Q 










U 












PQ 










< 










8 S 

is 

C/3 











PH 

3 



3% 

8 o 

U 

WvO 

OPQcj 
* 2 



P 
PH 







o <- 

sss 

* o 



o o 
o 0*0 

i-> +*r-< 



-g 
o 



w 



U 






JB 



EDITING AND PRELIMINARY TABULATION 



129 



a count of such cases can be included in the check total for that section. 
All written notes are confined to "Notes" and "Other," and do not 
interfere with the count of check marks. 

The simple process of totaling single columns becomes more com- 
plicated when cross-relationships are wanted. For example, if the 
make of the present radio were to be related to the make of the radio 
the student expected to buy during 1936, a two-way table would be 
prepared and the tallying method used. Only those who answered 
"yes" to the question, "Will you purchase a radio before 1936?" 
would be included. The completed work for 100 cases might appear as 
shown in Table 13. The facts could be read in the form shown, or 
as a final table by substituting figures for the tally marks. 

TABLE 13 

PROPOSED TALLY SHEET USING PART OF INFORMATION RECORDED 

IN WORK SHEET, FIGURE 15, HYPOTHETICAL DATA SHOWING 

RELATION BETWEEN MAKE OF RADIO COLLEGE STUDENTS 

OWN AND MAKE THEY EXPECT TO PURCHASE 



MAKB 

OF 

RADIO 
OWNED 


MAKE OP RADIO EXPECTI TO PURCHASE 


A 


B 


C 


D 





F 


G 


H 


Other 


No 
Informa- 
tion 


Total 


A 


111 


1 






11 




1 


1 


1 


11 


11 




B 




11 


1 


1 






1 




1 


111 


9 




c 


1 


11 




1 




11 




1 


1 




8 




D 


1 






11 




1 








1 


5 




E 




1 


1 


1 


111 




1 




1 




8 




F 




1 




1 




111 




1 






6 




G 


1 


11 






1 




1111 




1 


1 


10 




H 






1 




11 






11 


1 




6 


Other 


11 


1 




111 


1 




1 


1 


1111 


11 


15 


Home made 


1 


1 






1 


1 




1 




11 


7 


No informa- 
tion 








11 




1 


1 




1 


mi 
nu 


15 


Total .... 


9 


11 


3 


11 


10 


8 


9 


7 


11 


21 


100 



After this has been done, it can readily be seen that the collected 
information is no longer in a preliminary state. The table represents 
a selection and combination of certain parts of the original data and 
can be read as follows: 



130 BUSINESS STATISTICS 

a) Fifteen per cent did not state the make of radio owned and 
21 per cent did not state the make of radio they would purchase. It is 
difficult to draw any conclusions from the remainder of the table with 
so much information missing. 

b) Radio "C" seemingly is in disrepute. None of the present own- 
ers would repurchase it and only three would turn to it from other 
makes. 

c) While there is considerable evidence of shifts in consumer pref- 
erence, the radio owned at present has some advantage over competing 
products except in the case of "C." 

d) At least five of the students who built their present sets would 
not do so again. 

Other combinations of data can be made from the information on 
the work sheet, using tally forms similar to that shown in Table 13. 
The purpose for which the information was gathered will determine 
what forms to use. Note, however, that the work sheet itself is in no 
sense a final form for the data but merely an intermediate device. It is 
seldom published even when a complete record of the statistical analy- 
sis is included along with the report of an investigation. 

Mechanical Tabulation 

When a large number of schedules is to be analyzed the task 
of tabulation becomes enormous. Likewise when a great amount of 
cross-tabulation is necessary, even though the number of cases is not 
so large, the task of preparing tables is likely to become the "bottle- 
neck" of the investigation. Under either of these circumstances the 
present practice is to abandon hand tabulation in favor of the use of 
machinery designed for the purpose. At no other point is the statistician 
so much favored by the developments of the machine age as in tabula- 
tion. Equipment is available to perform quickly and accurately the 
steps of sorting, counting, cross-tabulating, and recording in columnar 
form. / 

These advantages have led a great many business concerns to install 
mechanical systems for the maintenance of records of current opera- 
tions/The variety of uses made of the "punch card" system for this 
purpose is illustrated by the following examples: bad debt losses of 
members of a retail credit association; broker's record of security deal- 
ings with customers; merchandise control of a mail-order house; rec- 
ords of premium payments of a life insurance company; service record 



EDITING AND PRELIMINARY TABULATION 131 

of employees of a shipbuilding company; deliveries to individual stores 
from the central warehouse of a chain grocery company; stock control 
in the warehouse of a chain grocery company. 

Principles. The basic principle of machine tabulation is that a 
hole punched in a card represents by its horizontal and vertical position 
a certain statistical fact. It becomes a permanent record that can be 
used in tabulation at any time by running the card through a machine. 

The first machine developed for this purpose was the "sorter." This 
machine will sort a pack of punched cards into numbered compart- 
ments according to any one set of information. Further refinement 
has led to an attachment which will count the number of cards going 
into each compartment; sorting-counting has thus been reduced to a 
single operation. 

The next step was the invention of the "tabulator," a machine that 
operates at a more complex level. After the cards have been sorted, 
the tabulator can add the amounts recorded on each and furnish a 
printed record of the total. For example, if cards were punched show- 
ing the weekly wage rates of a firm's employees, each card representing 
one employee, the tabulating machine could be set so that it would give 
a printed record of the number employed at each wage rate, the total 
earnings of each group, the total number of employees, and the total 
weekly payroll with a single running of the cards. 

Steps in the Process. Probably the only way to understand fully 
what the machines can do is to see them in operation. 2 It will be 
worthwhile, however, to present in some detail those parts of the process 
which receive the least attention in a practical demonstration: (1) prep- 
aration of a code; (2) transfer of collected information to a code 
sheet; (3) punching the cards; (4) sorting-counting; (5) tabulation 
of numerical information ; (6) cross-tabulation; (7) recording in tables. 
Some types of data will not require the use of all of the steps, but 
steps 1, 3, 4, and 7 will always be necessary. 

The code: (The preparation of the code to be used in transferring 
information from long-hand forms to a set of holes punched in cards 
such as those shown in Figure 16 requires considerable skill. Each card 

2 These machines are manufactured by the Electric Accounting Machine Division of 
the International Business Machines Corporation and the Tabulating Machine Division 
of Remington Rand Inc. The authors have found local representatives of both companies 
willing to demonstrate the use of the equipment either to individuals or to groups of 
students. Seeing the machines at work is vastly superior as a teaching device to mere 
description of their operation even though aided by pictures. 



132 BUSINESS STATISTICS 

FIGURE 16 

PUNCHED CARDS FOR MECHANICAL TABULATION 
A. 80-column card (punched as coded in Figure 19)* 



/ II 

I I 

000|000000|000000 || 00000|QOO||000||||00||00|0|OOuOOOO|0|OD|0 0000000000 |0 00 0|| 000 



2222222|2222|22|222|222l222222222222222222222222222 || 2|222222|222222|22|2 ||22222 
3333|||333333|3 3 3333 33 33 3 333 33 3|33 33 3| 3 3333 333|| 3| 3 3 3 3 3 3 3 3 3 |3 3 3 3 3 3 3 3 3 3 333 3 3 33333 
44444444444444444444444444|444|44444444444444444444444444|4444444444444444444444 
55555555555|55j5555555|55|555555|55555l55555555555|55555l555|5555555555555555555 
||66666 66| 66 6b66l66666E66666G6 66 666666666|666666666666666666666666|666666666666b 

818888888888 8888 8888 8888 88 88 8 8 888888 88 8 88 8 88 88 88 |8 8 8 8 8 8 8 8 8 8 8 8 8 8 |8 |8 8 ||8 |8 8 |||8 8 8 

S 9 1 9 9 9 9 9 1 e 9 9 9 9 9 9 9 9 9 9 9 9 9 9 99999999999999999 9 9 9 9 9 9 9999 9 9 9 9 9 9999 9 999999999999 |9 9 9 |9 9 
* i B M 50$Q _ \ _ LictNseo FOB usr 'INQFB ptirm 1777*9? _ ] _ 1!1!1_ 



Reproduced through the courtesy of the Electric Accounting Division of the International 
Business Machines Corporation. 

B. 90-column card * f 



*2 ': 1 2 1 2 4 ^2 4 1 2 1 2 T i '2 1 

'4 '4 '4 '4 '4 '4 '4 '4 '4 '4 '4 '4 4 '4 '4 '4 '4 '4 '4 J 4 '4 '4 J 4 4 J 4 J 4 '4 '4 '4 '4 4 J 4 ' '4 '4 '4 '4 J 4 >4 J 4 4 4 '4 

* $ 6 '6 '& <6 5 6 5 6 5 6 5 6 5 6 ? 6 5 6 5 6 5 6 5 6 5 6 S 6 5 6 5 6 5 6 5 6 V, 5 6 5 6 5 6 5 6 S 6 6 5 6 5 6 $ 6 5 6 5 6 ? 6 5 6 5 6 5 6 5 6 *6 *6 5 6 5 6 5 6 '* 

1 7 I 7 . 7 7 I 7 . 7 7 '! 7 8 7 , 7 8 'g ', ' 8 1* ', ', 7 g 7 g 7, 7 g 7 g 7g 7 g 7 g 7, 7 g 7, 7, 7 g 7g 7 g 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7 g 

V V ^ V ^ V f V f H ?i ?t M ?-i ?t ' 4 . ?t !t ?i 'i, ^ 't ^ 'i ?t ,i ?, i It 't 't ' f , 'i- 't 'i 't !t 't *V 4 9 t 't It ft It 



'2 '2 



r 






+ 9 4- V V V V "+ V V "-4- 9 + + V + 4- V Q 4 l- f/ 4- "+ 4 f "t "t- 9 1- 9 -f 9 -f + 9 -f 9 -f 9 -f \ \ "V + *+ + "-f V 

II Bt >1 34 US 5* 57 81 It 40 41 42 41 64 49 64 47 44 49 70 71 72 7 74 75 74 77 74 7* 40 41 4i 4 44 44 47 tO 



* The 90-column card is made possible by dividing a 45-column card horizontally, allowing 
fix spaces in each half column. The machines can be set to use either the upper or lower half 
of the card. The six spaces are used for 11 numbers by means of a combination punch, e.g., in 
column 1 the space above the line is punched for zero; if the 1 space alone is punched as in 
column 2 it represents 1: It and 9+ as in column 3 represents 2; etc. 

t Reproduced through the courtesy of the Tabulating Machine Division of Remington 
Rand Inc. 

usually represents one schedule or other individual record. Each ques- 
tion on the schedule is represented by one or more vertical columns on 
the card, while the numbers 0-9 in the column signify the various pos- 
sible answers to that question. 

The actual work of preparing a code can be visualized from the 
example shown in Figures 17 and 18. Figure 17 is a reproduction of 



EDITING AND PRELIMINARY TABULATION 133 

page 1 of a schedule used in collecting housing information in Buffalo, 
New York, in 1930. Figure 18 is a reproduction of the instructions for 
coding the answers to questions A-l, A-2 and A-3 (a), (), (c), and 
(d). Columns 1-3 make a direct transfer of the serial numbers of the 
schedules to the card. Column 4 shows the simplest type of transfer 
of non-numerical information to the card. The note attached to the 
code for column 4 indicates that the information in question A-2 was 
not coded. This was omitted because only a few of the houses in the 
study showed any variation in construction material. Column 5 involves 
the same transfer of non-numerical information as column 4, although 
not so easy to record because each number in the column stands for a 
combined occurrence of cellar and attic. Column 6 is a simple transfer 
of numerical information. Column 7 is a combination transfer like 
column 5. Columns 8 and 9 use a somewhat complicated plan for 
recording the other rooms in a house. For example, four in column 8 
and seven in column 9 means that the house contained a dining-room, 
an entrance hall, an inclosed porch, an open porch, and one other room. 
In the same way other numbers in the two columns indicate various 
combinations of rooms in a house. Column 10 gives a summary of 
the detailed information recorded in columns 7, 8, and 9. The note 
appended to the code for column 10 is necessary as a guide for the 
coder. 

In columns 7 and 10 the symbols X and B appear. The machines 
will sort on 12 items in a column since two extra punches can be made 
in the upper margin; therefore there is provision for two extra items 
in a column if necessary. This was done in column 7 to allow 
for two additional combinations of bedrooms and baths. Likewise in 
column 10 the largest house in the study contained 11 rooms and a 
few schedules were indefinite on the total number of rooms. 

It is not sufficient to prepare a code that will provide for all the 
information on the schedule, but the code should be so arranged that 
the desired tabulations can be taken from the machines with a minimum 
of sorting and cross-tabulating. In short, the person preparing the code 
must be familiar with all of the machine operations as well as with 
the method of subsequent analysis. 

For example, a code may be required that will designate each of 
the 48 states and the District of Columbia. It is obvious that two 
columns are needed in order to include so many items. Experi- 
ence shows that to use such a code as Alabama-00, Arizona-01, Arkan- 



134 BUSINESS STATISTICS 

FIGURE 17 
THE PRESIDENT'S CONFERENCE ON HOME BUILDING AND HOME OWNERSHIP 

Form One (Partial Reproduction) 
Case No. 

Information from Home Purchasing Families 

A. Description of house 

1. Type of construction 

Single house Income house 

Two-family house Other (specify) 

2. Material 

Superstructure: Cellar: Roof: 

Frame Stone Shingle 

Brick veneer Tile (a) Plain 

Brick Concrete block (b) Treated 

Tile stuccoed Other (specify) Fibre shingle 

Other (specify) Slate 

Other (specify) 

3. Rooms and space included 

(a) Is there a cellar Attic: Finished Unfinished 

(b) How many floors including cellar and attic 

(c) Is there a separate dining room dining nook or alcove 

kitchen pantry laundry room entrance 

hall 

(d) Number of bedrooms bath rooms lavatories 

glass enclosed porches screened porches open 

porches other rooms (including living room, den, library, 

play room, sewing room, etc.) 

(e) Dimensions of house: Front Depth 

(/) Size of lot occupied by house: Front Depth 

(g ) Value of lot without house: 

At time of purchase 

Present assessment 

(h) Corner lot 

4. Garage 

(a) Size: One-car Two-car Three-car 

(b) Construction: 

Frame Concrete block 

Brick Other (specify) 



EDITING AND PRELIMINARY TABULATION 



135 



FIGURE 18 
INSTRUCTIONS FOR CODING DATA ON HOME BUILDING AND HOME OWNERSHIP 

IN BUFFALO, NEW YORK, 1930 

(Reproduction of columns 1-10) 

Column 1-3. Serial number of schedule 
001 = 1 
526 = 526 
999 = 999 

Column 4. Type of house 

Single house 2 Income house 

1 Two-family house 3 Three-family house 

(Norn: Although material about the type of construction of the house will 
not be coded, the coder is instructed to keep a list of the schedule number of 
every house that is not of frame construction and to note the exact type of 
construction of these exceptions.) 

Column 5. Cellar and attic 

No cellar and unfinished attic 
1 No cellar and finished attic 
2 Cellar and unfinished attic 

Column 6. Number of floors in house 
1 One floor 
2 Two floors 

Column 7 Number of bedrooms and bathrooms 



3 Cellar and finished attic 
4 Cellar and no attic 
5 No cellar and no attic 



3 Three floors 
etc. 



1 1 bedroom 
2 2 bedrooms 
3 3 bedrooms 
4 4 bedrooms 
5 5 bedrooms 
6 6 bedrooms 



and 1 bath 
and 1 bath 
and 1 bath 
and 1 bath 
and 1 bath 
and 1 bath 



7 1 bedroom 
8 2 bedrooms 
9 3 bedrooms 
4 bedrooms 
X 5 bedrooms 
B 6 bedrooms 



and 
and 
and 
and 
and 
and 



baths 
baths 
baths 
baths 
baths 
baths 



Column 8-9. Other rooms in house 

See Appendix 1 for code number to be used. 

Column 10. Total number of rooms in house 

1 1 room 10 rooms 

2 2 rooms X 11 rooms 

3 3 rooms, etc. B Unknown 

Rooms to be counted in computing total for Column 10 



Include: 

Dining-room Other rooms 

Kitchen Living-room 

Entrance hall Den 

Bedroom Library 

Inclosed porches Playroom 

Sewing room 



Exclude: 
Dining nook 
Pantry 

Laundry room 
Bathroom 
Lavatories 

Open and screen porch 



136 BUSINESS STATISTICS 

sas-02, etc., is not efficient. It is better to use one column for 
geographical subdivisions and the second for the states in each as 
follows: 

Maine 00 Connecticut 03 

New Hampshire 01 New York 10 

Vermont 02 New Jersey 11 

Massachusetts 03 Pennsylvania 12 

Rhode Island 04 etc. 

Then, if only the major subdivisions are needed in a table, it will be 
necessary to sort on only the tens column instead of both. 

Other precautions can be taken to reduce to a minimum errors 
in transcribing data to the code sheet as well as in punching the cards. 
Whenever possible the number used in the code should correspond 
to the number written on the schedule. In Figures 18 and 19 it will 
be observed that this has been done for column 6, but not for column 4. 
In the latter case it was considered preferable to take off the answers 
in the same order in which they appeared on the schedule, but "income 
house" could have been placed first in preparing the schedule and 
coded as 0, one-family as 1, two-family as 2, etc. 

For the same reasons that determine the inclusion of extra columns 
in a work sheet, it is desirable to make allowance in the code for "no 
answer" or special cases which may require listing by hand. As far as 
possible the same number should be used in each column for such 
answers, the llth and 12th positions being used for this purpose. 
They are called "X" and "B," or "X" and "Y." 

r The code sheet: The complete code sheet reproduced as Figure 19 
was used as an intermediate step between the transfer of the informa- 
tion on housing from the collection schedule (partially reproduced in 
Figure 17) to the punched cards. The sheet is divided into fields and 
each column is labeled to conform to the descriptions in the code 
(partially reproduced in Figure 18). The entries in each row of the 
body of the sheet are the code numbers which stand for the information 
contained in the schedule whose serial number appears at the left. 
Schedule No. 669 is recorded on Figure 19 to illustrate the procedure. 
This information is also reproduced on the 80-column card of 
Figure 16-A. 

The construction of the code sheet greatly facilitates the punching 



EDITING AND PRELIMINARY TABULATION 137 

of the cards, but is not an absolutely necessary step in the process. 
The transfer of the information from written to numerical form can 
be done on the margin of the collection schedule. More time will be 
spent by the punch operator in taking the information from the margin 
of the schedule than from the code sheet. On the other hand the 
transfer from schedule to code sheet will require more time than cod- 
ing on the margin of die schedule. Which method to use must be 
determined in each case in the light of the particular circumstances 
involved. The three primary factors to consider are time, expense, and 
accuracy. 

For continuous recording of operations by business concerns the 
better plan usually is to arrange the original record so that the coding 
can be done on it directly without the use of a code sheet. The punch 
operator becomes so familiar with the code after a few months that 
the information can be punched directly from the original records 
without further reference to the code instructions. 

Punching the cards: The punching machine is just a simple device 
with 12 numbered keys fixed over a movable carriage containing the 
card. As a hole is punched in a column according to the coded infor- 
mation the card advances automatically to the next column in position 
for punching. These punches may be operated by hand or by electricity. ' 

Sorting-counting: This operation is the fastest part of the process. 
The machine sorts and counts several hundred cards a minute. In one 
type of machine the card itself acts as an insulator breaking an 
electric circuit. When the brush carrying the current comes to a hole 
in the card, the electric circuit is completed. The completed circuit in 
turn opens the guiding device to a compartment whose number cor- 
responds to that of the hole in the card, and the card drops into the 
compartment as it passes along on a conveyor. The other type of 
machine performs the same process except that the cards are picked 
out by pins that drop into the holes, instead of by a completed electric 
circuit. Both machines also have attachments that will count the 
number of cards falling into each compartment. Since the machine has 
only 12 pockets, it is obviously not possible to sort according to more 
than one column at a time. When it has been necessary to utilize two 
or more columns in a single code (as in Figure 19, columns 23 and 
24) the cards must be run through separately for each column, if a 
complete sort is wanted. In these two columns the cost of the house 
is recorded in hundreds of dollars, the last two ciphers having been 

















































1^ 






2 


SERIAL 
NUMBER 




CODE SHEET 
Used in Transferring Information From Questionnaire, Figure 17 
(Code filled in for schedule No. 669) 























































k 


TYPE 


HOUSE 
DESCRIPTION 
















































Jt 






M 


CELLAR 
















































u 









NO FLOORS 
















































I* 






si 


NO BEDROOMS 
















































-0 






0> 


OTHER 
ROOMS 
















































rs 






g 


NO ROOMS 
















































O 






r 


HOUSE FRONT 
















































^ 






fu 


HOUSE DEPTH 
















































Ji> 






bl 


LOT FRONT 
















































v 






J 


LOT DEPTH 
















































^ 






Ul 


GARAGE 
















































flv 






g 


YEAR 
BUILT 


I 

rn 
















































i 






5 


1 ST SLD BLT 
















































O 






x> 

s 


LAST SLD- 
BUILT 























































ru 


ORIG SELL 
PRICE 


ORIGINAL 
HOUSE COST 
















































fc 






Ul 


TOT COST 
PRES OWN 
















































O 






2 


DOWN 
PAYMENT 
















































* 






^ 


ORIG VALUE 
1ST MTGE 
























































6 


ORIG VALUE 
2ND MTGE. 
















































t. 






i! 


AMORTIZATION 

























































g 


FIRST 
MORTGAGE 


MORTGAGE 
PAYMENTS 
















































^ 






$ 


SECOND 
MORTGAGE 























































i 


1ST AND 2ND 
MORTGAGES^ 
















































<"| 






*^ 


FIRST 


HOLD- 
ER 
















































r 






IV) 


SECOND 
















































Cfc 






w 


FIRST 


WRIT- 
TEN 

""S 























































Jk 


SECOND 
















































Q> 






(n 


FIRST 























































01 


SECOND 


z *? 
















































<*> 






^ 


3RD MTGE 

























































(O 


AGE 


BREAD 
WINNER 
















































(^ 






^ 


OCCUR 
















































fc 






T 


WEEKS 
EMPLOY. 
















































Jo 
O 






i 


TOTAL 
EARN. 
















































Ci 






a 


TOTAL 
EARN. 


T) 
















































t>x 






>i 


DEPEND 
















































* 






s 


AGE 


BREAD 
WINNER 
















































*XJ 






9 


OCCUP 
















































ft 







N 


WEEKS 
EMPLOY. 
















































$S 






5 


TOTAL 
EARN 
















































$S 






^ 


TOTAL 
EARN 


n 
















































^ 






,g 


DEPEND 
















































x 






9 


DOWN PAYMENT 


















































JU 






O 


EDUC 
VAC 


DOUBLE PUNCH 
















































do 




















































^ 






O 


AUTO 
HELP 
















































CD 




















































Q 






-g 


FURNISH. 
EQUIP 
















































^i 




















































^ 






^J 


CLOTHES 
INS 
















































Qo 




















































^ 






bl 


SAVINGS 
















































Xj 






J 


MOVIES 
THEATRES 
















































*< 




















































Xj 






(A 


BOOKS 
MAGAZINES 
















































Oo 




















































o 






o> 


HOME 
IMPR 
















































en 




















































o 






J 


ENF SAV. 
INT 
















































Qo 




















































^ 






<* 

9 


JOB 
SCHOOL 
















































o 




















































^ 









MOVING 



O 



EDITING AND PRELIMINARY TABULATION 139 

dropped. Thus if a count is wanted by $100 groups, the cards are 
first sorted on column 23, in $1,000'$, and each $1,000 is then 
re-sorted and counted on column 24 in $100 groups $2,000, $2,100, 
$2,200, etc 

Tabulation and cross-tabulation: The "tabulator" consists of sev- 
eral banks of small adding machines which may be arranged electrically 
to record, total, and print almost any desired combination of data from 
the cards. The same thing can be accomplished by the use of the sorter 
alone, but only after multiple sorting, rearranging, and computing. The 
difference can be demonstrated by further reference to Figure 19. 

It was desired to prepare a table showing the average cash payment 
according to the total price to the present owner in $1,000 intervals. 
Without the tabulator, it would be necessary to make a sort on column 
23 and then to re-sort and count each pack separately according to 
columns 25 and 26. This would give results in the form of 12 fre- 
quency distributions each having possibly 100 class intervals from 
which the average cash payments could be derived. With the tabulator 
it was necessary to sort only on column 23. 8 The tabulator was then 
set to count the items in each $1,000 price group, to add the exact 
amounts recorded in columns 25 and 26 and to print the subtotals 
and grand totals as shown in Figure 20, columns 1, 2, and 3. The 
imounts of the first and second mortgages, columns 4 and 5, were 
also totaled during this same operation. Reading across the first row, 
the total cash payments made by 11 purchasers of houses costing 
between $2,000 and $3,000 amounted to $4,700; the total value of the 
first mortgages of these 11 houses was $17,900 and the total value 
of the second mortgages was $4,800. 

Recording results in tables: When the sorter alone is used a large 
number of blank forms must be prepared in advance for the recording 
of any possible cross-tabulation that may be needed, but with the 
tabulator this task becomes much lighter, 

When used in conjunction with the tabulator, the principal function of the sorter is 
to arrange the cards in consecutive order according to some one classification which will 
become the stub of the resulting table. If this classification has required the use of two 
columns on the card, the right hand or units column should be sorted first. These ten 
packs are then piled together in order, with zeros at the bottom and are run through the 
sorter on the left hand or tens column. Since the cards at the bottom of the pile pass 
through the sorter first, the unit zeros are sorted first and fall at the bottom of each tens 
compartment, followed by the unit ones, twos, etc. When the tens packs are then piled 
together, with zeros at the bottom, the entire set of cards is in consecutive order, and is 
ready to tabulate. The tabulator can be set to separate them according to each unit, if de- 
sired. In the case illustrated, only the tens values were needed, so it was necessary to sort 
and tabulate only on column 23, disregarding column 24 entirely. 



140 



BUSINESS STATISTICS 



FIGURE 20 

REPRODUCTION OF THE PRINTED RECORD FROM THE 

TABULATING MACHINE WITH HEADINGS ADDED, 

DATA FROM CODA SHEET, FIGURE 18 



(1) 

COST OF Housi 
TO PRESENT 


(2) 

NUMBER 


J 3 > 
TOTAL 

OF CASH 


TOTAL VALUE 
OF FIRST 


(5) 
TOTAL VALUE 
OF SECOND 


PURCHASER 

($1,000) 


OF 

CASKS 


PAYMENT 

($100) 


MORTGAGES 
($100) 


MORTGAGES 
($100) 


2 


11 


47 


179 


48 


3 


27 


238 


411 


302 


4 


32 


357 


772 


296 


5 


158 


1186 


4631 


2748 


6 


134 


1816 


4574 


2091 


7 


95 


1998 


3559 


1497 


8 


62 


1173 


2685 


1294 


9 


41 


932 


1999 


842 





23 


629 


1178 


532 


1 


14 


417 


798 


338 


t 


22 


968 


1292 


526 




619* 


9761* 


22078* 


10514* 



t The information in this row was recorded in the B position of the cards. The tabulator did 
not print the B. 

* Asterisks denoting totals are part of the machine record. 

The first step is the preparation of the plan for the derivative table 
which will result from the tabulation. Then an outline form of the 
primary table is made. This is used by the machine operator in arrang- 
ing the machine. When the cards are tabulated the results are printed 
by the machine in a form similar to Figure 20. This is the primary 
table. From the primary table the derivative table can be constructed. 

TABLE 14 

DERIVED TABLE FROM TABULATING MACHINE RECORD, FIGURE 20: 

ARITHMETIC AVERAGE OF CASH PAYMENT AND ORIGINAL FACE VALUE OF 

FIRST AND SECOND MORTGAGES FOR DIFFERENT PROPERTY COSTS. 

RESTRICTED TO PROPERTIES PURCHASED IN 1922 OR AFTER. 

619 BUFFALO FAMILIES 



COST OF 
PROPERTY TO 
PRESENT OWNER 


No. OF 
CASES 


AVERAGE 
CASH 
PAYMENT 


AVERAGE 
FACE VALUE 
FIRST 
MORTGAGE 


AVERAGE 
FACE VALUE 
SECOND 
MORTGAGE 


$2,000- 2,999 


11 


$ 430 


$1,630 


t 440 


3,000- 3,999 


27 


880 


1,520 


1,120 


4,000- 4,999 


32 


1,120 


2,410 


920 


5,000- 5,999 


158 


750 


2,930 


1,740 


6,000- 6,999 


134 


1,360 


3,410 


1,560 


7,000- 7,999 


95 


2,100 


3,750 


1,580 


8,000- 8,999 


62 


1,890 


4,330 


2,090 


9,000- 9,999 


41 


2,270 


4,880 


2,050 


10,000-10,999 


23 


2,730 


5,120 


2,310 


11,000-11,999 


14 


2,980 


5,700 


2,410 


12,000-15,999 


22 


4,400 


5,870 


2,390 


Total or average 


619 


1,580 


3,570 


1,700 



EDITING AND PRELIMINARY TABULATION 141 

In this case Table 14 was constructed by dividing the totals in each 
row of columns 3, 4, and 5 of Figure 20 by the number of houses 
in the row, column 2. 

Summary of Mechanical Tabulation. It is hardly to be expected 
that the whole process of mechanical tabulation will be clear merely 
from reading this description. Perhaps it will serve as a guide as to 
what to look for in a demonstration of the machines. 

Mechanical tabulation is a great assistance in accounting and 
statistical work both internal and external. But it cannot be used to 
advantage in all cases. Certain types of work call for the mechanical 
process; others are not adapted to its use. The outstanding criterion 
is the size of the investigation. If either a large number of cases or a 
large amount of cross-information is involved, the mechanical process 
should be used. 

PROBLEMS 

1. What routine does an editor follow in searching for irregularities in 
schedules? 

2. What is meant by re-editing? When should the process be employed? 

3. What would you include in the qualifications of an editor? 

4. The following is a card from the investigation illustrated in Figure 13. 
This card was returned to the collecting agent by the editor. What do you 
think the editor found wrong and what did he want the agent to do? 

COLLECTION CARD USED IN RESIDENTIAL VACANCY 
INVESTIGATION IN BUFFALO, N. Y. 

Serial No 

Address 526 So. Elm St. front and rear houses 

Ward Tract Enumeration District 

No. of Dwelling Places in Building 

One Two Three ~ 

Four... .4 - Over Four (give number) 

Occupied 2 Vacant 1 

Residential X Combination X 

A A. S. Agnew 

Agent S 



142 



BUSINESS STATISTICS 



5. Describe in detail an investigation in which you would use sorting-counting 
at the preliminary tabulation stage. 

6. Describe in detail an investigation in which you would use a tally sheet at 
the preliminary tabulation stage. 

7. Describe in detail an investigation in which you would use a work sheet 
at the preliminary tabulation stage. 

8. What is the principle of mechanical tabulation? 

9. What are the advantages of mechanical tabulation? 

10. Describe an investigation in which mechanical tabulation should not be 
used. 

11. The following is an approximate reproduction of the invoice form used by 
a firm having 63 salesmen, 4,000 customers, and listing in its catalogue 
13,200 commodities classified in 12 departments. 

JONES SMITH INC. 

HEAVY HARDWARE 

"Serving the United States" 

Date 

Invoice No 



Dept 

Salesman 
Territory 



SOLD TO 



Quantity 


Commodity 


Unit 
Price 


Amount 













Prepare a code for transferring all of the information on this invoice to 
45 -column mechanical tabulation cards. The code should be prepared so 
that information could be taken from the cards concerning any of the fol- 
lowing either separately or in combinations: sales from day to day, sales by 
departments, the record of any one individual invoice, the amount of goods 
sold over a period to any individual customer, the distribution of sales by 
states, cities, or territories, the sales made during a month by each sales- 
man, the quantity of each commodity sold during a period, the amount of 
sales during a period. 



EDITING AND PRELIMINARY TABULATION 143 

REFERENCES 

BAILEY, WILLIAM B., and CUMMINGS, JOHN, Statistics. Chicago: A. C Mc- 
Clurg & Co., 1917. 

Chapter IV contains a complete and lucid statement of the purpose of edit- 
ing schedules and the duties of an editor. 

BROWN, LYNDON O., Market Research and Analysis. New York: The Ronald 
Press Co., 1937. 

An excellent statement on editing and methods of testing a sample appears 
in chapter 12 and rules for preliminary tabulation are given in chapter 13. 

EIGELBERNER, J., The Investigation of Business Problems. Chicago and New 
York: A. W. Shaw Co., 1926. 

Chapter XIII explains the function of the editor as official critic of the 
quality of collected information. 

SCHLUTER, WILLIAM C., How To Do Research Work. New York: Prentice- 
Hall, Inc., 1929. 

A concise statement of the editing process appears in chapter XL 

Standards of Research. Des Moines, Iowa: Meredith Publishing Co., 1929. 

An excellent statement of the functions and qualifications of an editor 
appears on pages 29-32. Instructions for mechanical tabulation are given 
on pages 32-36 and a standard code on pages 69-75. 



CHAPTER VIII 
TABULATION 

DEFINITIONS 

THE TABULATION of statistical data is the orderly arrange- 
ment of concrete numerical information in vertical columns 
and horizontal rows. This definition excludes lists of facts 
which are not numerical and mathematical tables which deal with 
abstract numbers. It contains three separate concepts: (1) numerical 
information regarding actual items, events, values, or relationships; 
(2) a definite order of arrangement for this information; (3) the 
preparation of forms with rows and columns in which the numerical 
data may be recorded according to their orders of arrangement. 

The method of collecting concrete numerical information has been 
discussed in chapters IV and VI. It has already been indicated in these 
chapters that the final form in which the collected data are to be 
arranged must be anticipated to some extent in planning the collection, 
and also in the stages of preliminary tabulation as explained in chap- 
ter VII. However, the thorough analysis of the various orders of 
arrangement has been reserved for this chapter. The latter part of 
the chapter is concerned with the third factor in tabulation, the prin- 
ciples and practices in preparing tabular forms for the recording of 
classified statistical data. 

| Two kinds of statistical information may be presented in tabular 
form: (1) several sets of more or less heterogeneous information, 
and (2) data representing a definite universe expressed in a common 
unit. In the first kind of table the several sets of information are not 
expressed in the same unit, but they are arranged according to a single 
common characteristic, such as the dates of successive observations. 
Grouping such data in tabular form is a space-saving device very often 
used and is a legitimate statistical technique provided that the different 
sets of information bear some relation to each other. The other kind 
of table contains homogeneous data employing a common unit and 
arranged according to one or more definite orders of classification. 
The ensuing discussion of the elements and orders of classification 
deals with the second kind of table.] 

144 



TABULATION 145 

Classification 

Classification is the arrangement of a set of observations or com- 
puted figures according to a previously determined plan. This arrange- 
ment may involve the separation of a whole into parts or the listing of 
related sets of information* 

Elements of Classification. Each plan of classification involves 
several indispensable elements: 

1. The data have been collected and arranged for some definite 
statistical purpose. 

2. The enumeration or collection has involved one unit, that is, 
the items counted were all defined in the same way. 

3. These similarly defined units each possess one or more of the 
same variable characteristics so that all of them can be classified accord 
ing to each of these characteristics. 

4. For each classification that involves the separation of a whole 
(that is, a total of identically defined units) into its parts, the classes 
must be mutually exclusive. 

The foregoing statements all refer to data in the original form in 
which they have been collected, that is, to primary data. The distinction 
between primary and derived tables will be mentioned later, but for 
the present the elements of classification and the various orders of 
classification are discussed in the simplest terms, as applied to pri- 
mary data. 

Purpose: The purpose is so closely connected with the other ele- 
ments of classification that it requires little elaboration./ For example, 
we wish to know how many students in a freshman class are men 
and how many are women. "Freshman students," therefore, is selected 
automatically as the unit to be counted, and the variable characteristic 
according to which the units will be classified is "sex." Since this 
variable can have only two classes, male and female, a very simple 
classification will result: 



1 This definition differs materially from that given in Edmund E. Day, Statistical 
Analysis (New York: The Macmillan Company, 1925), pp. 36 and 42. Day distinguishes 
classification the separation of a larger group "population" or "universe" into smaller 
groups or classes on the basis of a specified criterion or characteristic featurefrom 
sertation the arrangement of an orderly succession of items relating differences in one 
variable to differences in another. This distinction, although theoretically a fundamental 
one, is disregarded in the discussion of the principles and methods of tabulation in this 
chapter, because classification and seriation both follow the same rules with regard to 
arrangement. The distinction made by Day is referred to at the end of chapter XIII, in 
introducing graphic methods that involve two variables. 



146 BUSINESS STATISTICS 

Male 125 

Female 98 

Total freshmen 223 

This elementary illustration shows why the final objective must be 
kept clearly in mind at the initial stages of the investigation. If no 
provision were made for recording male or female on the student's 
registration card, there would be no reliable way of counting the num- 
ber of men and women, since such names as "LaVerne" or "Marion" 
may be of either sex. 

/ Units: In any table of primary data, the unit is whatever is being 
counted.. One should be able to select any single figure from a table, 
for example, the first figure from Table 1, chapter I, and ask, "804,350 
what?" The answer which can be read from the title of the table 
is always in terms of the unit forming the basis of that particular tabu- 
lation, and /'/ will be equally applicable to every figure in the table. 
In this case the unit is passenger automobiles sold. Note that, although 
one might also say 804,350 Chevrolet s were sold in 1937, these further 
limitations do not apply to all of the figures in the table. They are 
therefore not part of the definition of the unit of the table. Chevrolet 
is one class of one of the variable characteristics "make of automobile" 
and 1937 is one class of another variable characteristic "model year." 
Variable characteristics: Collected data may possess one or more 
variable characteristics according to the degree of detail necessitated 
by the purpose of the investigation.! In the foregoing illustration, the 
variable characteristic "make of automobile" was subdivided into the 
classes, Chevrolet, Ford, and Plymouth, and the variable characteristic 
"model year" also had three classes, 1937, 1938, and 1939. In the 
previous illustration the freshmen who were classified by sex might 
have been differentiated according to a second characteristic, "university 
division." If a similar count were made in several universities, for 
several successive years, the complete data would contain four variable 
characteristics: university, division, sex, and year of entrance. Each 
student counted could be distinguished according to all four of the 
characteristics; for instance, student No. 1 might be a man, at Uni- 
versity C, in Business Administration, entering in 1940. 

Mutually exclusive classes: In the total count, therefore, each student 
would fall into one, and only one, of the possible categories under each 
of the four variable characteristics, that is, in one class of each classi- 



TABULATION 147 

fication. The class "entered in 1940" could not include any of those 
counted as "entered in 1941"; none of the freshmen entering Univer- 
sity A could also be counted as entering University B, etc. Likewise 
in Table 1, chapter I, each one of the 4,613,424 passenger cars sold 
during the 3 years is included in only one of the individual figures 
(known as "cells") of the table, except that the figures of the last row 
are subtotals, each one including all items in the column above it. 

It will be explained later that classification according to two or 
more characteristics may be crossed in a single table, as in the table 
of automobile sales. However, there will be no overlapping of the 
several classes in any one classification unless two or more variable 
characteristics become confused in listing the classes. If this is allowed 
to occur, such classifications as the following may result: 

HOMICIDES IN THE CITY OF NEW YORK, 1926 

Manhattan 218 Shooting 213 

Brooklyn 82 Assault 53 

Bronx 19 Stabbing 54 

Queens 22 Gas 7 

Richmond 3 Infanticide 7 

Whole city 344 Poison 

By Negroes 41 Accidental 10 

By husbands 4 By police 19 

By wives 3 Suicide 10 

In this table one must assume that the figure in the sixth row 
(whole city) is the total, since it is the sum of the figures for the five 
boroughs. These six rows alone comprise a correct tabulation of the 
whole number of homicides divided into five mutually exclusive classes 
according to the single characteristic, place of occurrence. The re- 
mainder of the table lists homicides according to at least three addi- 
tional characteristics: race of the person committing the crime, relation- 
ship to the victim, and cause of death. If all of the classes of these 
several classifications were given so that each classification included all 
of the 344 cases, the table would not be incorrect, although it would 
afford no cross-classification. As it stands, all the classes appear to 
be parts of a single classification, but there is overlapping between 
them. For instance, some of the 41 homicides by Negroes probably 
were committed in Manhattan, some of the Negroes may have been 
husbands of the victims, and some of the stabbing cases may have 
also been committed by Negroes. A single homicide could have 
answered to every characteristic that has been suggested; hence the 
classes are not mutually exclusive. 



148 BUSINESS STATISTICS 

Orders of Classification, Time, Space and Attribute. There are 
three main types of characteristics with respect to which data may be 
classified: (1) variation in time, (2) variation in space, and (3) vari- 
ation of an attribute. In a previous example freshman students were 
classified according to (1) time (year of entrance), (2) space (loca- 
tion of the university), and (3) the two attributes, sex, and university 
division in which each one was registered. Thus all three orders of 
classification are represented in this example. Another case of the 
three orders in a single set of data can be found in the tabulation of 
yearly car loadings in the United States, by "months," by "regions," 
and by "size of railroad." On the other hand many tables will 
contain only one or two orders of classification, or there may be 
two or three variable attributes and no variation in either time or 
space. 

In all of the examples that have been mentioned, the number of 
units in each class was obtained by counting objects, that is, freshmen, 
automobiles sold, loaded cars, and even the cases of homicide, accord- 
ing to their several variable characteristics. In many tabulations, how- 
ever, counted objects or persons are replaced by prices or rates. The data 
may be expressed in units of dollars and cents but each figure is 
actually a ratio meaning "number of dollars paid per bushel" or "per 
ton," etc. Such data may be classified in the same way as countable 
units, according to mutually exclusive variable characteristics, which 
may relate to changes in either the numerator or the denominator of 
the ratio. 

Table 15 contains three such classifications of wheat prices, each 
of which illustrates one of the orders of classification. Table 15-A 
shows changes in the unit "Average Weekly Cash Price of No. 2 
Hard Winter Wheat at Chicago" classified according to time of occur- 
rence. The four weeks quoted constitute four classes in this time 
classification. Other characteristics are held constant by the definition 
of the unit, i.e., all prices are taken from the same market, for cash 
sales, for the same grade of wheat. There is no overlap in the classes 
of the classification. 

Table 15-B shows changes when the unit "Average Cash Price of 
No. 2 Hard Winter Wheat for the Week of October 3-8, 1938" is 
classified according to the variable characteristic place of occurrence. 
The three cities named as markets are three mutually exclusive classes 
in the spatial classification. Other characteristics are held constant by 



TABULATION 149 

the definition of the unit, i.e., all prices are taken for the same time 
period, for cash sales, for the same grade of wheat. 

Table 15-C shows changes in the unit "Average Cash Price of Spring 
Wheat at Minneapolis for the Week of October 3-8, 1938." The five 
grades of wheat in the attribute classification are clearly defined and 
non-overlapping in accordance with the definitions established by 
the United States Grain Standards Act. Other characteristics of the 
unit are held constant by the definition, i.e., all prices are taken for 
the same time period, for cash sales, in the same market. 

TABLE 15 
THREE TYPES OF CLASSIFICATION* 

A. TIME 

AVERAGE WEEKLY CASH PRICES OF No. 2 HARD WINTER WHEAT AT CHICAGO 
FOR FOUR WEEKS OF OCTOBER, 1938 

AVERAGE PRICK 
WEEK PER Bu. 

Oct. 3-8 $ .669 

Oct. 10-15 672 

Oct. 17-22 674 

Oct. 24-29 680 

B. SPACE 

AVERAGE CASH PRICES OF No. 2 HARD WINTER WHEAT IN THREE MARKETS FOR THE 
WEEK OF OCTOBER 3-8, 1938 

AVERAGE PRICK 
MARKET PER Bu. 

Chicago $ .669 

Kansas City .638 

St. Louis 678 

C. ATTRIBUTE 

AVERAGE CASH PRICES OF FIVE GRADES OF SPRING WHEAT AT MINNEAPOLIS FOR THE 
WEEK OF OCTOBER 3-8, 1938 

AVERAGE PRICK 
GRADE PER Bu. 

Dark Northern Spring Heavy. . No. 1 I .738 

Dark Northern Spring No. 1 .733 

No. 2 701 

Northern Spring No. 1 .640 

Hard Amber Durum No. 2 .651 

Crops and Markets, United States Department of Agriculture, Vol. XV, No. 11 (Novem- 
ber, 1938), p. 254. 

The three types of classification can be easily distinguished by 
noting that in Table 15-A all observations refer to a single place and 
invariant attributes, time being variable; in Table 15-B all observations 
refer to a single time interval and invariant attributes, location or space 
being variable; and in Table 15-C all observations refer to a single time 
period, a single place, and attributes that are invariant except the 



150 BUSINESS STATISTICS 

attribute of grade of wheat. Greater care is necessary in dealing with 
variable attributes than with variable time or place because a universe 
can have many different attributes. For example, classifications of 
prices of spring wheat could also be made according to percentage of 
dockage, kind of contract of sale, or type of purchaser. 

Attribute classifications are sometimes divided into qualitative and 
quantitative. Table 15-C is an example of a division according to the 
qualitative attribute, grade of wheat. Attribute classifications are quan- 
titative when the attribute is expressed numerically, as in size or price 
groups. Such classifications take the form of frequency distributions, 
the discussion of which is deferred to chapter XV. 



TYPES OF TABLES 

Statistical tables can be divided into two categories primary and 
derivative. 

Primary Tables 

,' A primary table is a full presentation of the collected data in the 
original units. An investigation of any complexity may require several 
primary tables. Such complete tables serve as a basis from which the 
statistician selects certain related sets of data that may be presented 
in various ways, depending on the purpose in view. 

If the original data are to be published for general use without 
knowledge of what the uses will be or which relationships will be 
considered most important, primary tables may be given in full. Due 
to the expense of publication such tables are not commonly found in 
print, but certain parts of the original data such as a group of subtotals 
comprising a grand total may be published in order to bring the 
important data together in compact form. 

Derived Tables 

A derived table is one that presents the results of some analysis 
of the original data, such as percentage distributions, per cents of 
increase or decrease, values per capita, index numbers, or coefficients. 
These are constructed from the original data by the application of 
statistical methods and may be published either alone or accompaniM 
by a part or all of the data upon which they depend. 



TABULATION 



151 



The chief requirement of a derived table is that it should present 
one unified set of relationships. An attempt to set forth too many ideas 
in one derived table usually results in confusion. The preferable 
method is to use several short clear tables each of which has one defi- 
nite purpose. Thus one primary table frequently becomes the source 
for many derived tables/ 

Parts A and B of Table 16 show two entirely different sets of infor- 
mation, but both were drawn from the same primary table which gave 
the distribution of explosives workers of each grade of skill according 
to average hourly earnings. 

TABLE 16 
Two DERIVED TABLES FROM ONE PRIMARY TABLE* 

A 

PERCENTAGE DISTRIBUTION OF EXPLOSIVES WORKERS, BY AVERAGE HOURLY EARNINGS 

AND SKILL, OCTOBER, 1937 



AVERAGE HOURLY EARNINGS 
(IN CENTS) 


SKILLED 


SEMI- 
SKILLED 


UN- 
SKILLED 


TOTAL 


Under 

37.5 and under 
42.5 and under 
47.5 and under 
52.5 and under 
57.5 and under 
62.5 and under 
67.5 and under 
72.5 and under 
77.5 and under 
82.5 and under 
87.5 and under 
92.5 and under 
97.5 and under 
102.5 and under 
107.5 and under 
112.5 and under 
125.0 and over . 


37.5 


.4 
.3 

.5 
.5 
2.3 
4.1 
7.5 
7.5 
7.8 
10.7 
14.8 
10.7 
10.2 
7.5 
5.2 
5.3 
3.5 
1.2 


.7 
1.4 
1.0 
2.5 
5.4 
10.3 
16.3 
14.2 
15.6 
11.5 
13.5 
3.6 
2.3 
1.1 
.2 
.4 


2.2 
1.9 
3.9 
9.3 
17.0 
20.6 
14.1 
15.6 
7.1 
4.4 
2.0 
1.0 
.5 
.3 

'.i 


.8 

.9 
1.2 
2.7 
5.8 
8.8 
11.2 
10.8 
9.9 
9.8 
12.1 
7.0 
6.2 
4.4 
2.9 
3.0 
1.8 
.7 


42 5 


47 5 


52.5 


57.5 


62.5 


67 5 


72 5 


77 5 


82.5 


87.5 


92.5 


97 5 


102 5 ... 


107 5 


112 5 


125.0 




Total 


100.0 


100.0 


100.0 


100.0 



HOURLY EARNINGS RECEIVED BY EXPLOSIVES WORKERS IN THE UNITED STATES 
ACCORDING TO GRADES OF SKILL, OCTOBER, 1937 



GRADES OF SKILL 


HOURLY EARNINGS RECEIVED IN CENTS 


Middle Wage Received 
by Lower Half 
of Workers 


Median 


Middle Wage Receired 
by Upper Half 
of Workers 


Skilled 


73.7 
63.6 
54.8 


85.3 
71.9 
61.3 


96.4 
80.8 
69.4 


Semi-skilled 


Unskilled 



* Adapted from Monthly Labor Review. United States Department of Labor. Bureau o. 
Labor Statistics. Vol. 47. No. 2 (August. 1938). pp. 383. 384. 



152 BUSINESS STATISTICS 

Table 17 furnishes an example of a derived table in which some of 
the original data are presented along with the analysis. It appears 
rather complex but a study of the per cent columns reveals that only 
one form of analysis is being presented, namely the per cent of dwell- 
ings vacant in each ward for the various types of buildings. 

ESTABLISHED PRACTICE IN THE CONSTRUCTION OF TABLES 

The difference in purpose and content between primary and derived 
tables has been explained, and in general the rules for tabulation will 
apply to either kind. Derived or summary tables appear in print more 
frequently and they present the chief problems in table construction 
from the point of view of utility to the reader. The users of statistical 
analyses may be divided into two groups, those who will read tables 
and those who will not. As far as the second type of readers is con- 
cerned tabular matter might just as well be kept out of print. For 
their benefit it is necessary to point out in the text the most important 
information contained in any table. Those who will read tables would 
prefer to have the textual description omitted so that they can draw 
their own conclusions from the data presented. For the sake of this 
group the tables must be made as effective as possible. 

Certain principles and practices which contribute to that end have 
become well established by usage and should generally be followed, 
although occasionally some deviation from customary procedure may 
increase the effectiveness of a table. In such cases it is more important 
to use good judgment than to follow rules slavishly. 

Unity 

The data contained in a table should pertain to one definite subject, 
should be confined to that subject, and the table should include what- 
ever information is pertinent to a complete presentation of the subject. 

In Primary Tables. In a primary table there can be no question 
as to unity regardless of the degree of cross-classification if the subject 
is presented in terms of a single unity as illustrated by Table 12, 
page 124, in chapter VII. 

Other primary tables may contain information expressed in several 
units but with no sacrifice of unity due to the fact that significant ratios 
can be derived from combinations of the various sets of data. Table 18 
is of this kind. The primary data given in the table are in three differ- 



TABULATION 



153 



PQ 



*5 

32: 



a 

Q 



S 

3rt 



H 

I 



Tw 



q ; o q |Ooqr^qqqqq q "OPqqqiHoqf<%irjir\|rH 



6 

fc 



6 

fc 



O -AO r> 



SP^^^rl^ 1 ^ tfN *o^ 1 1 v ^?r l 5'0' H I r** 



-..'<J'MOOOOOO'*f v OC400<NOOOO<NOOO <N 
A(N(TkfM rHrH |^.(N mVO^ 



H r (vi m o\ n cs r^ CM vo vo NO t 



o en -^ oo I s - r^ 



NO 



P 



4 W C4 N <N (N (N <N 



154 



BUSINESS STATISTICS 



ent units, number of telephones, number of messages, and number of 
dollars of operating income. However, all of them deal with the one 
subject, operations of the telephone company. Such ratios as number 
of messages per telephone, income per telephone, income per message 
of each type, as well as indexes showing relative changes in each unit 
or in each ratio, afford numerous possibilities for analysis. 

In Derived Tables. In a derived table simple units are replaced or 
supplemented by such measures as compound units, averages, and 
percentage relationships. These measures are usually based upon sev- 

TABLE 18 
OPERATING STATISTICS OF THE BELL TELEPHONE SYSTEM, 1931-36* 



YEAR 


(1) 

NUMBER OF 
TELEPHONES 


(2) (3) 
NUMBER OF MESSAGES 


(4) (5) 
OPERATING INCOME 


(6) 
TOTAL 


Local 
(000 
omitted) 


Toll 
and Long 
Distance 
(000 
omitted) 


Local 
Service 


Toll 
Service 


1931 
1932 
1933 
1934 
1935 
1936 


15,389,994 
13,793,229 
13,162,905 
13,378,103 
13,844,663 
14,453,552 


22,704,825 
21,525,558 
20,147,635 
20,676,520 
21,465,285 
22,869,510 


985,500 
823,866 
747,155 
781,830 
830,740 
911,340 


$723,920,495 
670,736,747 
617,253,153 
607,676,275 
640,993,436 
665,152,512 


$326,268,854 
263,147,955 
243,905,775 
258,691,363 
273,483,256 
306,238,511 


$1,050,189,349 
933,884,702 
861,158,928 
866,367,638 
914,476,692 
971,391,023 



* Annual Reports of the American Telephone and Telegraph Company. 

eral separate sets of data but the derived table becomes a unified whole 
if all of the relationships included contribute to a single purpose/ 

An example of a compound unit is the "ton-mile," a measure of 
operating density in railroading. It represents one ton moved the 
distance of one mile and is derived from the two simple units "tons 
of freight" and "miles operated." Similarly in other lines of activity 
"man-hours," "dollar-years" and "foot-pounds" appear. Each of these 
is more complex than a simple unit yet each presents a single concept 
so that its use results in a unified table. 

Table 17 is an example of a derived table in which per- 
centages accompany data that have been classified in two directions. 
These percentages are based upon a third classification, number of 
dwellings occupied and vacant, the original data of which are not 
included. All of the information in the table contributes toward the 
one subject, percentage of vacancy in dwelling places. It has already 
been noted that a derived table loses its effectiveness if it presents too 



TABULATION 155 

many kinds of relationships. A number of additional sets of percentage 

relationships could be worked out from the original data of Table 17, 
such as percentage distribution of vacancy by wards or by type of 
building, but if any of these were included there would be a loss of 
unity and the resulting table would have no definite purpose. 

Example of Lack of Unity. Many tables appear in print which 
do not possess the unity found in Tables 17 and 18. An example of 
heterogeneous data presented in condensed form is shown in Table 19- 
The information on types of freight car loadings together with the 
Federal Reserve index forms a table complete in itself. "Pullman 
passengers carried" has nothing to do with freight car loadings nor 
with "financial statistics." "Canal traffic" on only two canals, and 
measured in two different kinds of tons, is possibly the most important 
information on the subject of water traffic, but more complete data 
should be given in a separate table, since there is no possible way of 
relating this information to that concerning freight car loadings. 
Tables that contain several sets of unrelated information or of related 
data expressed in units that are non-comparable are justifiable only in 
publications in which space-saving is a more important consideration 
than unity. 

Complexity 

The orders of classification employed in an investigation depend 
upon the nature of the data and the purpose for which they are 
collected. The extent to which it is necessary to study combinations 
of the several characteristics of the data will determine the degree of 
cross-classification required in tabulation. 

Simple Classification. The first order of classification, commonly 
referred to as a "one-way table," has been illustrated in Table 15. 
In each part of this table the prices of wheat are classified according 
to a single characteristic and no difficulty arises in their presentation. 
A derived table that contains a single classification follows the same 
form of construction. 

Cross-Classification. If classification is desired according to two 
characteristics simultaneously it will not serve the purpose merely to 
list the two separately. They must be cross-classified in a "two-way" 
table. This obviously requires listing in two directions, consequently 
one classification will appear horizontally and the other vertically, as 
in Figure 21. In this case the kinds of animals slaughtered are listed 



156 



BUSINESS STATISTICS 



o 

< 


ge 

Pk rt 


?. 

5 ^ G 

iS-gS 


VO CM % 00 00 O 

s s p s a 

-H 1 ( 




H 
1 


4J D 

1 i'g 


jl. 

o -5 S 


O > O W^ iTk O 

fOk VO C\ Xf 00 *H 

O\ ^k ^ f* ON f^ 




3 


c/S^S 


J3 w 5 

H*S 


rf% i-4 r<C *\ ir\ 00 

ft 




w 


AS 




*-t O\ f** 00 00 
C\ VO T VO O 

r^ o\ in vo m 




M 


HH 


*0 


. CS <N c<% >f <N 

. CN 




< 5 

* 


i dx bo <u 

20 s | 


i = 

J2 


rN \o <% cs os r* 

fTv \O Xf C\ ON OV 

m \o o >O r\ r. 




3~ 
u & 


V rt C C 

fc w 


| 




t-i i-* -! ON ON - 
O i- xf rr\ rr ^ 

i i 




* < 

2<3 


bo i w 

S G > S 


H 


rA CM rH ON 00 VO 
(N fN X}* fO <N <N 

oo ON CN o in ON 




PH 

Is 


O c 

ia*33& 


A <n 

11 


r-i -! r\ n ON o 
C<N \r\ \r\ oo r-- <N 

ITS <N CN <N CN cT 

O O i-t <N NO ir 
ON r- VN r^ ^ ON 
u-\ CN ON -* *- <N 




to " 




H S 


<N -* -l rn ^H 






H J 

!ai 

5 




CT\ i-l fO ITN Tk ITV 

CM ir w^ ^> O oo 
CN r>* r^ % c<% r-i 










rv O ON O oo "> 






JH u C 3 

w JH o 




\O oo <N ON m r^ 
O OO O fTk (N I-* 

xr *- CM CN r* ^ 






0> 




r- vo r- oo r- en 











CM n oo rv oo CM 
r- <N <N XT 






, i 


(O 


i ir\ r\ cTk ccv o 






I|I2 


s 


ITV ^ r- C\ CO 00 

ir\ r^ NO r\ r\ ir\ 








O 


C>l r-< <-H l-H T-4 r-< 


. 


to 




1! 


(A 

X3 
G 
S 


O i-J 00 O (N NO 
TN \O T\ NO (N* f-J 

<N T- rH 1-1 r-i r-H 


CN 

d 


fc 


S -o .S ,5 


3 

o 


oo ON oo r-i o r- 


00 


J 


2 c 2 2 o 
c5 w cS& S 


J5 


r- r* ^ oo r> o 

rT\ CN| <% CN <N eC 




o* 


3 
^ 


H|I 




NO ON ON rH fO ^f 

Sr^ i-J TN ^ r4 
- C^l C4 CS cCv 




r-k 


K 

o 
w 




"3-e.S 

C p 

U O 




ON ^ VQ OS ^ \O 
NO CO ITN i-J <N |^ 

\o r-- oo T-. r-< i~, 

rH i-H r-( vH 


tC 

6 
^ 








NO XT rTk CM 00 <N 


oo" 




2 

iS 

S il 
"8 ^1 

1-4 


c, II 

i/ 

^7 

^JQo 


<N O <M r ON O 

ro o to o r- r^ 
O 10 ir> NO *"\ \O 

rH 
1*- <N ITN "^ rH f\| 

O iri r NO vo r^- 

rH 


r^ 

I 

s' 
i 

on 




* 1"8 

ri c| 


-S2 ^ 
tf 
S 9 


1^- rH ^ cTk O -l 

o IA r* vo \o r 

rH 


| 

> 




S 






L> 




5S 

Q 






'{? 




3 

a 
% 

< 




niiii 


f 

CO 




H 




ON <N Ck <^ ITN NO 
<N fO cT cTk rO fO 
ON ON ON ON O\ ON 


* 



TABULATION 



157 



FIGURE 21 

FORM FOR Two- WAY CROSS-CLASSIFICATION 

NUMBER OP CATTLE, CALVES, HOGS AND SHEEP SLAUGHTERED BY THREE MEAT 

PACKERS IN 1937 



PACKERS 




ANIMALS S 


LAUGHTERKD 






Cattle 


Calves 


Hogs 


Sheep 


Company X 










Company Y 










Comoanv Z 










Total 











horizontally across the top. These headings are known as the caption 
and the vertical lists of data beneath the several headings are referred 
to as columns. The names of the packing companies appear down the 
left side of the table. Any such vertical listing is termed the stub of 
a table and the horizontal lists of data following the several items are 
called rows. In order to determine the identity of a figure in any one 
cell of the table it would be necessary to follow the column up to the 
caption and the row across to the stub on the left. 

When three or more orders of classification are desired the problem 
becomes more difficult since a two-dimensional sheet of paper must 
serve as the medium for a three- or four-dimensional relationship. The 
only possible solution is to subdivide identically each class of one or 
both of the first two classifications. Figure 22 illustrates a "three-way" 
table, in which each class of animal has been subdivided to show two 
types of inspecting agency and the total of each class. Finally, either 
these same classes may be again subdivided or each of the classes in 
the stub may be subdivided to take care of a fourth classification, 

FIGURE 22 

FORM FOR THREE-WAY CROSS-CLASSIFICATION 

NUMBER OF CATTLE, CALVES, HOGS AND SHEEP SLAUGHTERED UNDER FEDERAL AND 
CITY INSPECTION BY THREE MEAT PACKERS IN 1937 



PACKERS 


ANIMALS SLAUGHTEBED AND INSPECTION AGENCY 


Cattle 


Calves 


Hogs 


Sheep 


Fed- 
eral 


City 


Total 


Fed- 
eral 


City 


Total 


Fed- 
eral 


City 


Total 


Fed- 
eral 


City 


Total 


Company X. . 
Company Y.. 
Company Z. . 


























Total 



























158 



BUSINESS STATISTICS 









! 














ex 

1 


S3 










^ 






P 










I s 

H 






Is 










2 

! 

C > Q 


1 

i 


I 


Is 










o ^ 


o 




I s 










.^ < f*- 

$ *J m 
CN| O c/ 5 OS 

^J CO ty 
S S s 
O /^S t/1 


i 




Is 










^ ^ 

E IH 

!( 

EC4 O 

5 S 


^ 

a 


3 


as 










o ES 

:r! 

131 






b 










H 

1-4 s 

t6 

<3| 

ix. C 






Is 










o G 

PCS 

i 

s 

1 




I 


gs 




































i 


III 





c i : : 






< ^ 

i 1 ! 

^ c 




|j|l 


iiii 

6 


|||| 


-00 



TABULATION 



159 



ACKERS 







U 



52 

So 



S 00 

W 



H; 



ex, 
w 



S 

tf 



NUMBER OF 



- K 

C 3 



~l 



C 3 



I 

J 



O, 55 U. 

o < o H 
w <* H 



<O 



* 
6 






N 



u 



-* O 



160 BUSINESS STATISTICS 

resulting in a "four-way" table. Figure 23 in which grade of meat 
has been added, and Figure 24 adding a time classification, illustrate 
respectively the method of combining four and five orders of classifi- 
cation in a single table. Additional classifications could be introduced 
if there were others pertinent to the data. Inspection of Figures 23 
and 24 shows, however, that they contain too much information to be 
comprehended easily. Whenever the further subdivision of data leads 
to tables which are too complex to be read easily, it is preferable to 
increase the number of tables. Do not spend time devising ways of 
presenting multiple classifications in a single table; make two or more 
tables instead. A fairly good, though not universal, rule is to confine 
a table to three classifications if it is being made for publication. 

From the point of view of construction the addition of a set of 
percentage relationships to the original data increases the complexity 
but does not add another order of classification. Thus if the percent- 
age of animals slaughtered by each packer were required in Figure 21, 
it would be necessary to introduce the subheadings "Number" and "Per 
Cent Distribution," under each type of animal in the caption. It would 
then have the appearance of a three-way table although only a two- 
way classification. 

Clarity 

The reader's ability to grasp the content and significance of a table 
depends primarily upon the clarity of wording in every part of the 
table. Careful attention must be given to the phraseology of the title 
and all headings, and to the inclusion of any necessary notes of ex- 
planation and reference. 

Title. The first essential is a title which in the simplest form will 
tell what is in the table. If several lines are required to describe the 
contents, a brief title explaining the major characteristics can be used 
with a subtitle in smaller print giving the more detailed description. 

The title should clearly name the unit or units in which the data 
are being presented including all the limitations on the data. These 
limitations usually include time, space, and exact specifications of the 
units employed. If a presentation of some derived relationship is the 
main purpose of the table that relationship should be stated with equal 
precision. The methods of classification used should also be specifically 
indicated especially in studies of limited scope when it is only this 
latter qualification that distinguishes one table from another. In such 



TABULATION 161 

cases it is not necessary to repeat in each table the general description 
common to all of them. 

Stated briefly, if the title definitely answers the questions, what, 
where, when, and how, it is probably adequate. No better guide can 
be found for the correct and specific wording of titles than the usage 
in the United States Bureau of Census publications. Such titles as 
" Prime Movers, Motors and Generators, by Number and Rated 
Capacity, for Establishments Classified According to Number of Wage 
Earners Employed: 1929" or "Percentage of Homes Owned and Rented, 
by Color and Nativity of Head of Family, for the United States: 1930, 
1920, 1900 and 1890" illustrate how the various phrases needed in a 
complicated title may be best arranged to give a clear idea of the 
content of the table. 

Headings. Every part of the table requires a heading. This in- 
cludes general headings for caption and stub, for each order of classifi- 
cation, and for each of the separate classes. Clarity and brevity are the 
chief considerations in wording these headings. They must be com- 
plete enough to serve as accurate guides to the data, although lack 
of space usually requires that they be worded as briefly as possible. 
There should be no unnecessary repetition of information that has 
already been given in the title. Each main heading should include 
whatever detail applies to all of its subheadings so that the latter 
which are the most crowded of all may be very short. The mechanics 
of arranging these headings contributes to the effectiveness of the table 
and will be discussed later in the chapter. 

In referring to the data in a table it is convenient to be able to 
designate the columns by number. In Figure 24 the numbers are 
placed at the extreme top of the column headings. As an alternative 
the numbers can be placed just above the line separating the headings 
from the data as shown in Figure 23. Sometimes the horizontal rows 
of a table are also numbered but this practice is less common. The 
numbering of columns or rows is not a requirement but merely a con- 
venience to be used whenever it will facilitate the description of the 
table in the text or reference to it in subsequent tables. The stub itself 
is ordinarily not numbered as a column. 

/ Another common practice that aids in reading a table is the insertion 
of the unit directly over the columns to which it refers,; as illustrated in 
Table 19. Thus the data of certain columns are clearly desig- 
nated as thousands of cars, thousands (of passengers), thousands ot 



162 BUSINESS STATISTICS 

dollars, and thousands of tons, repectively. In the index number 
column "Monthly average, 1923-25 = 100" is inserted in a similar 
position. When the headings are in the stub the units are stated along 
with each item or in a separate column, as in Table 51, page 289- 
7 Footnotes. Anything in a table which cannot be understood by the 
deader from the title and headings should be explained in one or more 
footnotes. These footnotes should contain statements concerning 
figures that are missing, preliminary or revised, and explanations con- 
cerning any unusual figures or other features of the table that are not 
self-explanatory./ A study of tables appearing in print will provide 
multiple illustrations of the use of footnotes. Table 27, page 222, 
is an excellent illustration. 

/ References. A table should always give exact reference to the 
source or sources from which the data were taken. Three advantages 
grow out of such citations: (1) The reader is given a sound basis for 
evaluating the data. (2) Readers who wish to obtain other data simi- 
lar to those appearing in the table are able to do so. (3) The author 
of the table insures himself against the inconsistency of source booksy 
For example, the data for wheat production will depend entirely upon 
which issue of Agricultural Statistics one happens to use. If a table 
contains data that have not been published previously, this fact should 
be stated in a note including the name of the collecting agency. In 
general, the use of exact references is a method of guarding against 
the charge of inaccuracy. 

Arrangement 

/ The arrangement of a table on the page, the arrangement of data 
in the table and the choice of ruling, spacing, and type face contribute 
to the effectiveness. 

Fitting the Table to the Page. The limitations of the size of the 
printed page determine the form of tables to a large extent, hence the 
real problem of arrangement is to fit the table to the page so that it 
will be effective in that setting. One of the most important features 
is symmetry with respect to the margins and binding of the page. The 
table should be planned to read from left to right. Tables which must 
be read from the side of the page, tables which cover two facing pages, 
and tables which must be unfolded either sideways or vertically are 
occasionally necessary, but they should be used only when no combi- 
nation of smaller tables will serve as well. 



TABULATION 



163 



The proportions of the page may also determine which headings 
to use as stub and which as caption. In order that the height of a 
table may exceed the width, a one-way table is usually arranged ver- 
tically and in a cross-classification the longer list of items will ordi- 
narily appear in the stub. Length of wording of the headings is 
another factor to consider. It is better to use the longer wording in 
the stub if possible since too many words crowded into a narrow 
column heading are very hard to read.' 

When additional classifications are introduced into either the cap- 
tion or the stub, the new headings may be arranged in subordinate 
positions as in Figures 22, 23 and 24, or the table may be rearranged 
to make the former classification subordinate to the new one. In all 
such cases the question will arise of arranging columns and rows so 
as to emphasize significant relationships. The chief consideration in 
Figure 24 is comparison between 1936 and 1937, consequently the 
time classification has been arranged in pairs of adjacent columns. 
If this were not the case, time could be made a main classification 
leaving the types of inspection in adjacent columns, as follows: 



CATTLE 


1936 


1937 




Federal 


City 


Total 


Federal City 


Total 



Order of Items. The three types of classification according to time, 
space, and attributes have been discussed. A classification in any of 
these categories frequently results in a large number of subdivisions 
or classes, which necessitates the introduction of a definite order of 
arrangement. 

Time classifications follow the natural order of the events repre- 
sented. It is only when the major emphasis falls on the most recent 
events that a reverse time order is used, i 

Spatial classifications sometimes follow the order of geographical 
proximity as when the main subdivisions of the United States are 
given from northeast to southwest. The New England States are fre- 
quently named in the order familiar to everyone, Maine, New Hamp- 
shire, Vermont, Massachusetts, Rhode Island, and Connecticut, but 
ordinarily a list of any length is most usable if arranged in alphabetical 
order. Size or importance is also occasionally the basis for spatial 



164 BUSINESS STATISTICS 

classification. The method used must be determined according to the 
nature of the data and the reader's familiarity with the criterion 
selected. 

Attributes of quality lead to great variety of tabular arrangement. 
These may be listed according to importance or in some other order 
familiar to the reader, but again for comparability and ease in locating 
any given item the alphabetical arrangement is preferable. 

An example of time arrangement in natural order appears in the 
stub of Table 19, page 156, and in the same table the classification of 
types of freight car loadings is alphabetical. It should be noted that 
an item such as "miscellaneous" or "other" is placed at the end of the 
list in any order of arrangement. Table 20 shows three examples of 
other orders that may be followed. 

In Table 20-A the data in a spatial classification are listed according 
to the size of the city in which they occur. Tables 20-B and 20-C show 
two possible arrangements of one set of data in an attribute classifi- 
cation, one being in order of size and the other an arbitrary arrange- 
ment. In the former it is the size of the per cents, i.e., the data that 
are being tabulated, that is used as a basis for the arrangement of the 
classes, whereas in 20-A the determining size was that of the geo- 
graphical classes themselves rather than the amount of tax receipts. 
The order of 20-C is a combination of form of organization, impor- 
tance, and respectability. That is, commercial banks, industrial banks, 
and personal finance companies are privately owned corporations 
operated for profit. The next four are co-operative plans usually 
conducted on a small scale. The pawnbrokers and unlicensed lenders 
are the most important members of the group but are not on the same 
plane of respectability as the others because of the questionable busi- 
ness methods sometimes employed. 

Ruling, Spacing, and Type Face. These are devices for increasing 
the effectiveness of a table by concentrating emphasis on important 
entries and by relieving the monotonous appearance of figures in rows 
and columns. Whenever rulings aid the reader in understanding the 
classifications and subclassifications of a table they should be used. 
Double and triple rulings are not necessary since equal effectiveness 
can be achieved by using a single heavier line to separate major divi- 
sions in a table. , It will be observed that in Figures 23 and 24, pages 
158 and 159, every column is separated from the next by rulings, but 
that only the main groups of rows are so separated. In many printed 



TABULATION 



165 



TABLE 20 

EXAMPLES OF TABLE ARRANGEMENTS 



B 



SPATIAL DISTRIBUTION ACCORDING TO SIZE 
OF CHARACTERISTIC 

TOTAL TAX RECEIPTS OF LARGE 
CITIES, 1935* 



ATTRIBUTE DISTRIBUTION ACCORDING TO 

SIZE OF DATA 

PERCENTAGE DISTRIBUTION OF 

SMALL LOAN BUSINESS DONE 

BY VARIOUS LENDING 



CITY NUMBER 
IN ORDER 
OF SIZE 


CITY 


TAX RECEIP 
(000,000 
omitted) 


AGENCIESt 


T/Yi OF LENDING AGENCY 


PERCENTAGE OF 
SMALL LOAN 
BUSINESS 


1 


New York 
Chicago 
Philadelphia 
Detroit 
Los Angeles 
Cleveland 
St. Louis 
Baltimore 
Boston 
Pittsburgh 
San Francisco 
Milwaukee 
Buffalo 
Washington 
Minneapolis 
New Orleans 
Cincinnati 
Newark 
Kansas City 

Seattle 
etc. 


$586 
209 
90 
82 
57 
42 
31 
34 
65 
42 
31 
34 
33 
29 
23 
19 
19 
32 
18 
15 




r T r 11 i 


28.9 
23.2 
19.3 
13.9 
7.3 
2.4 
23 
1.9 
.8 

100.0 




Pawnbrokers 


a 


Personal finance companies 
Industrial banks 


4 


Commercial banks 
Credit unions 


5 


Remedial loan societies.... 
Axias 


6 


Employers' plans . . 


Total 


7 


8 


C 

ATTRIBUTE DISTRIBUTION ACCORDING TO 
ARBITRARY ARRANGEMENT 
PERCENTAGE DISTRIBUTION OF 
SMALL LOAN BUSINESS DONE 
BY VARIOUS LENDING 
AGENCIESt 


9 


10 


11 


12 


13 


TYPE OF LENDING AGENCY 


PERCENT AGE OP 
SMALL LOAN 
BUSINESS 


14 


15 


Commercial banks 


7.3 
13.9 
19.3 
.8 
2.4 
2.3 
1.9 
23.2 
28.9 

100.0 




16 


Personal finance companies. . 
Fmployers' plans 


17 


Credit unions 


18 


Remedial loan societies 


Axias 




Pawnbrokers 


19 

20 


Unlicensed lenders ... . 


Total 






* Statistical Abstract, U. S. Department 
t Evans Clark, Financing the Consumer 


of Commerce, 1937, p. 218. 
(New Yoik: Harper & Bros., 1933), p. 30. 



tables there are no horizontal rulings but the separation between the 
rows and groups is accomplished by appropriate spacing and by suc- 
cessive indentations of items to indicate various degrees of subclassifi- 
cation. Bold face type, larger type, and italics are frequently used to 
set off totals or percentages from the other data or to emphasize im- 
portant items. The use of these devices is well illustrated in the tables 



166 BUSINESS STATISTICS 

presented in the first section of each monthly issue of the Survey of 
Current Business. 

Totals 

What Totals To Include. A table is not complete unless it includes 
whatever totals and subtotals are required to summarize the data pre- 
sented, but this does not mean that every row and column must be 
totaled. A total implies that the same unit is used in all of the classes 
added and that the several classes taken together form a homo- 
geneous whole., 

This principle can be explained by reference to Figure 24. The 
four kinds of animals slaughtered, cattle, calves, hogs, and sheep, 
have not been added together because a total would imply, for example, 
that a beef carcass means the same thing as a hog carcass in terms of 
meat and meat products. The time classification has not been totaled, 
since a total made up of two years' observations would be purely arbi- 
trary. The emphasis in a time comparison such as this is usually on 
the relation of the several different periods to each other. However, in 
case the entire period covered in a table represents a genuine total, such 
as a year's production resulting from the sum of the production for 12 
months, a total for the year would have significance. 

On the other hand a total for each kind of animal by inspecting 
agency is included because the classification represents the separation 
of a whole into comparable parts and because the total is a production 
figure of value for comparisons in the other classifications. Likewise 
three sets of totals have been computed vertically: (l) the subtotal 
for each company, (2) the subtotal of each grade of meat, and (3) 
the total of both grades of meat for all three companies. It will be 
noted that the selection of subclassification in the stub of Figure 24 
has resulted in bringing together in the last section the subtotals of 
each grade of meat, whereas the subtotals for each company are sepa- 
rated from each other. Comparisons of the latter can easily be made 
since the company subtotals and the total of all companies are printed 
in italics. 

The preceding examples dealt entirely with classifications in which 
the units were counted objects and included totals wherever pertinent. 
In classifications of counted objects which are not parts of a total, 
sometimes an adjusted total may be used, but in general no total should 
be included. Distinct from these are classifications of rates or prices 



TABULATION 167 

as Table 15, page 149, in which averages should take the place of totals 
whenever a summary figure is required to make the table complete. A 
derived table may include totals, averages, or neither, according to the 
nature of the data and the purpose for which it is constructed. 
' Position of Totals. The natural sequence of reading is properly 
accommodated by placing totals at the foot of the columns and at the 
right of the rows. Statistical practice may reverse this, placing totals 
at the top and left when they are more important than the individual 
items. Both positions are in common use. This question must be de- 
cided for each table in terms of what the maker believes will best serve 
his purpose. Briefly, the usage is: totals at the top and left or at the 
bottom and right; but not at the top and right, nor at the bottom and 
left. Subtotals follow the same usage in any given table, that is, if 
the totals are at the top and left of the table the subtotals will also 
appear at the top and left of the items from which they are computed. 

Significant Figures 

There are two aspects of the subject of significant figures in a table. 
The first relates to the number of significant figures which need to be 
retained for accuracy, while the second relates to a total and its parts. 
f Number of Figures Retained. Tables are often unnecessarily en- 
cumbered by retaining all of the digits in numbers running to millions 
or even billions. A good example of this is Table 18, page 154, in which 
too many digits have been retained purposely for discussion at this 
point. It was stated in chapter II 2 that four significant figures are ade- 
quate for statistical work. The unit for telephones could therefore be 
changed to 10,000 telephones, but customarily units are used in thou- 
sands, millions, or billions. In accordance with this practice the number 
of telephones should be expressed in thousands, making the figures 
accurate to five digits. The column showing the number of toll calls 
should be expressed in units of one million calls accurate to one deci- 
mal place, giving four significant figures. The entire table has been 
reproduced in corrected form as Table 21. The revised form is more 
effective and sufficiently accurate for statistical purposes. 

f The general rule is, retain only four or at most six significant figures 
in the tabular presentation of data, but in all cases indicate the size of 
the unit used either by showing the number of digits omitted from 



168 



BUSINESS STATISTICS 



TABLE 21 

REVISED FORM OF TABLE 18 



YEA 


(l) 

NUMBER OF 
TELEPHONES 
(000 
omitted) 


(2) 

NUMBER o 
(000,000 


(3) 

F MESSAGES 
omitted) 


(4) 

Oi 

( 


(5) 

DERATING iNCOa 

000,000 omitted 


(6) 

[E 
) 


LOCAL 


TOLL AND 
LONG 
DISTANCE 


LOCAL 
SERVICE 


TOLL 
SERVICE 


TOTAL 


1931 


15,390 
13,793 
13,163 
13,378 
13,845 
14.454 


22,705 
21,526 
20,148 
20,677 
21,465 
22.870 


985.5 
823.9 
747.2 
781.8 
830.7 
911.3 


$723.9 
670.7 
617.3 
607.7 
641.0 
665.2 


$326.3 
263.1 
243.9 
258.7 
273.5 
306.2 


$1,050.2 
933.9 
861.2 
866.4 
914.5 
971.4 


1932 


1933 


1934 


1935 


1936 



each number or by using the expressions "in thousands" or "in mil- 
lions," etc. Note that in column 3 "(000,000 omitted)" refers to the 
number of digits dropped between the original decimal place and the 
newly established one, regardless of the fact that the data as written 
are accurate to hundred thousands, 

, Rounding Off Totals. A different question arises when the table 
consists of a total and its parts. The entire table should be made up 
from the original data and each item then rounded off separately, as 
the data in Table 21 were rounded off from Table 18. As a result 
the sum of the individual items as they appear in the rounded-off 
table may be either greater or less than the rounded-off total of the 
original data. One such instance may be seen in Table 21. For 1932 
the total operating income, column 6, does not correspond exactly to 
the sum of its parts in columns 4 and 5, although reference to Table 18 
will show that no error has been made either in the addition or in 
rounding off any of the three figures. 

The percentage distribution is a particular case of the part to total 
relation which requires further explanation. For example, in Table 22 
the percentage distribution of the interest-bearing debt should add to 
100 per cent whether it is carried to one decimal place or two, because 
the total debt is being distributed and the sum of the parts must be 
equal to the whole. However, the exact sum of column 2 is 99.99 and 
of column 3 is 100.1. The discrepancy arises from rounding off the 
last decimal place in the computations. No theoretical question is in- 
volved but merely the practical one of the best method of expressing 
the total in the table. Since the total is exactly 100 per cent, in any 
such case of apparent discrepancy it should be written to one less si#- 



TABULATION 



169 



TABLE 22 

THE INTEREST-BEARING DEBT OF THE UNITED STATES TREASURY ON APRIL 30, 1937, 

ACCORDING 10 TYPE OF OBLIGATION, AMOUNTS AND PERCENTAGE DISTRIBUTION* 



TYPE OF OBLIGATION 


A (1) 

AMOUNT 

OUTSTANDING 
(000.000 
omitted) 


(2) 

PERCENTAGE 
DISTRIBUTION 
(two decimals) 


(3) 

PERCENTAGE 
DISTRIBUTION 
(one decimal) 


General bonds 


20 133 7 


58 70 


58.7 


U. S. savings bonds 


755 5 


2 20 


2.2 


Adjusted service bonds 


4096 


1 19 


1 2 


Treasury notes 


10 377 4 


302^ 


30.3 


Certificates of indebtedness 


2687 


.78 


.8 


Treasury bills 


2 353 2 


6.86 


6.9 










Total 


34.298.1 


100.0 


100. 



* Statement of the Public Debt of the United States, April 30, 1937, Treasury Department. 

nificant figure than appears in the individual per cents. Thus the 
total of the first percentage distribution has been written 100.0 and 
the second 100. 

Checking 

Any possible errors in a table should be guarded against by com- 
plete checking. There may be errors in the numerical content due to 
mistakes in addition or other computations, or the validity of the table 
as a whole may be impaired by errors in judgment with regard to the 
general plan, items included, or details of wording and arrangement. 

Accuracy of Numerical Content. Checking of computations should 
be a routine step in table construction. Whenever any part is totaled 
the addition must be checked, preferably on a separate outline form. 
If there are horizontal totals and vertical totals as in Table 17, page 
153, the grand total (25,209) that is common to the two sets of totals 
must be checked in both directions. If the form is similar to Figure 
24, page 159, all parts of each subtotal must be checked as well as the 
grand total. It cannot safely be assumed that if the totals of a table 
check in one direction or if both sets of subtotals check with the grand 
total there are no mistakes in the figures. Similar precautions must be 
taken in the case of every computation in every step of table prepara- 
tion, especially apparently simple operations that are done mentally 
such as multiplying by 2 or dividing by 25. Each step should be 
checked before the next one is taken and corrections of errors should 
be rechecked. 

Validity. Errors of judgment may occur in the planning of tables 
or during their construction. Checking by an experienced person who 



170 



BUSINESS STATISTICS 



has an adequate knowledge of the background of the subject as well 
as experience in table construction is required if such errors are to 
be avoided. The types of errors that could have been corrected by 
careful checking are illustrated in Table 23. 



TABLE 23 

LIVE-STOCK EXPORTATION, 1929-32 
(1,000 head) 



YEAI 


l 


928 


IS 


50 


l< 


>31 


1! 


>32 






PER 
CENT 




PER 
CENT 




PER 
CENT 




PER 
CENT 


Hogs, live 


1 279 


88 


654 


52 


355 


22 


179 


13 


Hogs, butchered 


47 


3 


117 


9 


191 


11 


16 


1 


Bacon 


116 


8 


410 


32 


920 


56 


1,008 


74 


Ham and other products, pieces. . 


8 


1 


76 


7 


179 


11 


160 


12 


Total 


1,450 


100 


1,257 


100 


1,645 


100 


1,363 


100 



The title fails to name the country exporting. It indicates all kinds 
of livestock but the only live animal included in the table is hogs. 
Butchered hogs can scarcely be called livestock and certainly bacon 
and hams are not. The title names the years 1929-32 which might 
mean either annually or a total for the period. However, the headings 
do not agree with either, even disregarding the obvious typographical 
error in printing 1950. Since there are at least three different units 
used in the table head of hogs, pieces of ham, and bacon in some 
unnamed unit the unit "1,000 head" should not have been stated 
along with the title. It should have read "All figures in thousands." 
The caption heading "Year" is misplaced and the stub has no heading. 
The caption subheadings should read "Number" and "Percentage Dis- 
tribution." However, the most fundamental error is in the presenta- 
tion of totals and percentage distributions for items that are expressed 
in non-comparable units and that in no sense make up a total having 
any meaning. 

,j TABULAR FORMS 

In the conduct of routine statistical work the tabulation of data 
becomes a continuous process of recording the same types of data 
daily, weekly, monthly, or annually as the case may be. For this pur- 
pose the preparation of standard forms in quantity saves time and 
promotes uniformity of records. These forms must be carefully 
planned and drawn up, hence they are excellent illustrations of the 



TABULATION 



171 



use of the principles of tabulation. Frequently special adaptations 
must be made to facilitate the recording of particular types of data, a 
circumstance which increases the desirability of studying such forms. 
On succeeding pages selected forms used by government agencies and 
by a business concern are presented with brief discussion. 

Government Forms 

Three forms used in the price section of the crop reporting service 
of the Department of Agriculture are presented as examples of the 
blanks prepared for recording external data. 

Figure 25 is used principally for summarizing the data on prices 
received by farmers by months and by commodities. The printed headings indi- 
cate the following: Column 1, weights for computing weighted average prices 
for the United States ; column 2, straight or unweighted average prices reported 
by correspondents in each State; column 3, average prices reported by corre- 

FIGURE 25 
DEPARTMENT OF AGRICULTURE FORM C. E. 1-128 









(C K 1.1WI 
t 

STATF* 
Mnmo 


>ATE 


PRICES RECEIVED BY FARMERS 

COMMODITY 


DATE 


PRICE PER 


E 
T 
S 


K- 

EN 
10 


H 








PRICE PER 


E) 

T 
S 


c- 

EN 

or 


4 


n 




= 


Vt 


G1 


_ 


- 


5T 
W 




w 


ST 
AV 


D 


SI 


Al 




H 


NA 


L 


i 


5T 
W 




W( 

i 


JTt 
XV 


) 


51 


FAT 




Fl 


NA 


L 


Nil 
















































- 




















Vt 










-- 












- 












- 








Mint 




































K I 











Com. 


































N K NC. 
















































































NY 










- 
















- 


-^ 








~ 








^. 




_ 










=-_ 






















- NJ--- 
























- 


- 












^ 


Ariz 


























- 








- 



























- 












- 






Utah 


















- 














Nov. 


















































MOUNT 
















































































Wash. 






















- 




















- 












- 


























Orv B 










~ 






























- 


- 


- 
















Calif 




































































PACIFIC 
















































































US. 




























































































- 














- 
















~ 














- 











































































172 



BUSINESS STATISTICS 



FIGURE 26 
DEPARTMENT OF AGRICULTURE LONG-TERM BLANK 



(COMMODITY) : 


AVERAGE PRICE F 
RS T (STATE) 


'PR (UNIT) 


(RECEIVED OR PAID) BY | 


-ARME 


(FROM) | 9 


(TO) I9 






YEAR 


JAN. 


FEB. 


MAR 


APR 


MAY 


JUNE 


JULY 


AUG. 


SEPT. 


OCT. 


NOV. 


DEC. 


AV 






































































































































































































^^ 


1 \ 

r\ 




^ 

^^\ 


^-^ 




^-^ 
f^ 


" ^1 


^= 
^- , 


-^ 




\^: 


=^ 


^^- 


\ 


"--^__ 















































































































































































Source: Bureau of Agricultural Economics, Crop Reporting Board. 

spondents as weighted by price reporting districts ; column 4, price recommended 
for the commodity by the State Statistician; column 5, price adopted by the Crop 
Reporting Board after review of all the available data, including the original 
price listing sheets, market price reports and other check data. Column 6, headed 
"Extension," provides space for recording extensions of price times weight com- 
puted in averaging the adopted State prices to obtain United States and division 
averages. Since there are thirteen columns on this sheet it is possible to include 
the record for two months on one sheet. This forms our primary record of 
monthly prices. 

Figure 26 is designed to make possible the summarization of monthly 
prices for a number of years on one sheet. One or more of these sheets carries 
the complete record of monthly prices for each commodity, by States, including 
weighted annual averages. It is our practice to bind these sheets for all States 
together in books, one for each commodity. 

Figure 27 is provided for summarizing the monthly prices in another 
way, with prices for all commodities for one State listed, three years to one 



TABULATION 



173 



FIGURE 27 
DEPARTMENT OF AGRICULTURE FORM C. E. 1-1)9 































8 




- Li 


% 




- 


















3 




C 1 


^ 






















u> 







(i,^ 












e 3 


3 








* 




? 5 


", 












" I 















rc-S 


l|s 






















W 




"0 


ii* 












. 


, 








i 




m 


*.j 












8 1 











ft 




sl 


r| 






















5 




s4 


r 5 " 3 























& 


















5 


si 












., 


-3 












N a 


n" 








2 




il 














^ 










5 




Q^j 


3 














3 








3 


















3 










y 








3 












N ^ 


g 













t. ? 


11 






















Ul 






fc'S 














M 








< 



















& 


- 








? 



















" S 










3 




. a 


5-, 1 






















i 5 


o> 


A 


tS* 






u 






a 


5 















PU 














i 











55 




^ 






t 






** ^ 










"1 5 


S 


s 


"31 












> 










if " 


CM 





''5 












2 











fe 


>- 




ew 












-* 


i. 










W 




n 














g 








^ 


a 


# i! 


n i 












B 










i E 




tttf. 
























< 


w 




PH 












a 3 


1 








fe 


w 




o 












H W 


h& 








i 5 




n ^3 


til 














- 








i % * 


w 


1? 


^ 
























u 




^ 























R a 

LI o 







o w 












^ S 


fc 








fc a 




N gg 


^4 






















E 5 H 






t| 














? 














^ 












K I 


S'a 








1 s i 






%5 '=?- a 












" S 


& 








2 h 




^ g 


i i =^^ 

g i ^ - 














3 












w 
















o 








5 






pt 1 - 9 












s 


|1 












^; 














^ e 


| 








j g 




2|3 

























5 




p3 














s* 


U-5J 


























ID BS 


8 '1 








1 


















** i = 2 


fi|J 








pa 


















w 










g Z 


















3 


s 
4 








e | 


















S S^ 


.Sa 

3i 


























"| 










i W co 




















3 








: w y 




HI 














a 















*~ 








a 








jjl 








^ CJ Si 


















" I 


""a 








V) 5 












0* 




























7 






3 


o 




















1 ~ l 








"il 


























" 1 


-a 




























3 





































174 BUSINESS STATISTICS 

sheet. These forms are also bound together in books in which all States, geo- 
graphic divisions and the United States are represented. 8 

These forms are so planned that they can be used for several dif- 
ferent kinds of tabulations. For example, Figure 27 is also used to 
record the individual crop reports from which the state averages of 
column 2 of Figure 25 are computed. 

Business Forms 

The routine tabulation by business concerns of data concerning 
their own operations differs from the record keeping of government 
agencies mainly in the type of data tabulated. This difference leads 
to the use of forms which are quite distinct from those previously 
presented, and which are specialized to meet the needs of the par- 
ticular concern using them. It follows that as many forms could be 
presented as there are business concerns, but those used by a single 
concern will illustrate some important uses and adaptations./ 

The forms shown on succeeding pages are used in the routine statis- 
tical work of the Eastman Kodak Company of Rochester, New York. 4 
This company divides the year into 13 four- week periods, hence all 
of the forms which follow are filled out 13 times each year. 

Figure 28. The "Comparison of Sales by Divisions is a form 
prepared primarily for the use of executives. It is the usual type of 
two-way table intended to be read in either direction, i.e., the per- 
centage of increase or decrease of sales of any of the products can be 
compared from one division to another or the percentage of increase 
or decrease of sales of different products in any division can be com- 
pared. The percentage of increase or decrease of sales by divisions in 
next to the last column is not obtained from the preceding columns 
but by comparing sales in dollars with the corresponding figure for 
the preceding year. These percentages can be compared with the per- 
centage of change in bank debits given in the last column. 5 The prep- 
aration of the percentage of change of bank debits requires considerable 
work because the divisions of the country used by Eastman Kodak 

The explanation of the use of these forms was supplied by Mr. Roger F. Hale, Agri- 
cultural Statistician, Division of Crop and Livestock Estimates, Bureau of Agricultural 
Economics, United States Department of Agriculture, Washington, D. C. 

4 They arc presented with permission of the Eastman Kodak Company and are made 
available through the courtesy of Mr. A. H. Robinson, Assistant-Treasurer. 

5 The use of bank debits as an indicator of business activity is discussed in chapter 
XX. 



TABULATION 



175 



FIGURE 28 
COMPARISON OF SALES BY DIVISIONS 




PERCENTAGE CHANGE FROM PREVIOUS YEAR 

Black figures are increases, red figures are decreases 



DIVISION 




AMATEUR 
CAMERAS 


AMATEUR 
FILM 


CK 
PRODUCTS 


PROF FILM 
& PLATES 


PAPERS 


CHEMICALS 


XRAY& 
DENT FILM 


TOTAL* 


BANK 
DEBITS 


A 




















r 


B 






















C 






















D 






















E 






















F 






















G 






















TOTAL US 























* Excludes sales not common to all divisions 



Company as shown on the map do not coincide with Federal Reserve 
Districts for which the percentages of change in bank debits are avail- 
able in print. 

Features of this form are the ruling, the use of black and red ink 
to distinguish increases and decreases and the exclusion from the total 
column of sales not common to all divisions. For example, if there 
were some divisions in which "chemicals" were not sold, the sales of 
this product would be excluded from all divisions in computing the 
percentage of change of total sales. 

Figure 29. The line above the double ruling might read "Re- 
port for Frfth Period Ending May 20, 1939." The record of 






fa 

o 

5 

s 



il4 






h 

c/3 
O 



ill! 

w 2 ~ 







P 



Lost Time (Excl. 
Lost Time (Excl. 






all 



I! 

5-3 






f 



1 






ft 









!l 


J 1 






8 


o fl 
















, 


g 






sis 








sli2 


8e 








g ? | H< 








J I 






, 


s! * 






1 


in 0. 






e 


Ill jl 






2 

| 


5 5 j . 






!i M 1 


t ? fe i 

H 






9 i|f 1 


ijjf || 






" .h 


11 i* - 








* ; I i 








- 5 








i I 








t" 




^ 




g5 









J 




i 








i 








u 




3 1 




I 








a 




g * iS 


If 1 


S 


^ 









^ 'T 


S g| i s jj S ! ! 1 


Grand Total Lost Time 
Total Lost Time Not Paid 
(Incl. in above Grand T< 

Total Lost Time Paid For 
(Incl. in above Grand T< 


Total No Lost or Overt inn 
(Average for Period) 


u 

ill! | 
S2 i I 

1- ^ 

S H 







ATS j% +3 i - +3 


d 


aT 




















p*3 ^ 9 S .2 




S a 


SR o3 











*? 









"" g o 13 g o 


! 


1 


la 








a 


^ 









fl W PH ' M f~\ 


% 


3 * 


^ S 








Q 


*p t 


^ 






fl 3 HPi 




o 


d M 


a . 

oi 

8 <3 








2 

1 


S 

w 

1 


fl 

rt 






| 3 g-Sj^S 


1 

2 


" o 


'g s 

> g 5 




3 




g 


g 


1 






1/2 S S P, 'o 


c3 


0- 


s^ 




d 

3 




I 





S 




H 


1 & S j?o |^1 


"S 


^"fl 


a 




"2 






o 

4-3 


p 




EPOR 


o| f j fill 


o 

"2 
'3 

3 


t| 


3S 

o -^ 




I 

S 




H 
(4 


M 

3 




1 




tf 


feo ^s.^S 


-40 

a. 


.0 J 


& .s 




J8 






<T- 


^ 


,0 


FIGURE 30 


INSTRUCTIONS ON THE REVERSE SIDE OF LOST TIMI 


>rtablc Employes 
1 In "Summary of Total Employes" the figures on the total for Manaj 
part mental Superintendents and Assistant Superintendents, and Main 
be indicated in a separate item. 
) The "Average for Period" figures are averages of the four figures for tl 
week of the period, 
i General L'nder the section on "General Employes" are to be reporte 
or O\ertime" basis. The total "General Employes' 1 figures should be t 
) No Lost or O\crtime Under the section on "No Lost or Overtime Err 
ployes except "General Employes " Exclude the Manager, Genera 
Superintendents and Assistant Superintendent^, and Mam Ofiice Depai 
or Overtime Employes" figures should be the average for the period. 


bsences of one hour or more arc to be reported. Absences for any cause cxce 
period exceeding 26 weeks for any one employe. 


nces A\ith Permission (Code No. 1) 
1 absences with permission other than for slack work, illness, accidents, vaci 
excess o\cr 40 hours should be indicated in one item for both "General Em 
aployes." 


nces without Permission (Code No. 2) 
elude all time lost without permission and when no explanation is given. 

: Work absence (Code No. 3) should be indicated for both "General" and 

cportable absences on account of illness (Code No. 4) arc to be reported 
to both "General" and "No Lost or Overtime Employes"). 


> 8 houis or lobs. 
) More than 8 hours, but not exceeding 40 hours. 
l More than 40 hours, but not exceeding 26 weeks. 


nces on account of accident and injuries (Codes No. 5 and No. 6) should 
) Employe has returned to \\ ork. 


) Employe has left employ of Company. 
1 It has been decided to be a case of permanent disability. 


.tions Paid For (Code No. 7) 
ider this heading include all time granted for annual vacations which is paic 
both "General" and "No Lost or Overtime Employes." 


nee for discipline (Code No. S) should be indicated for both "General" and 


nee for "Excess Time Over 40 Hours" (Code No. 9) should be indicated 
/ertiine Employes." 






O, rt Q O "O 


d d 


w5 ^ M 


^ fl o *" OT 


c3 JO o 


%3 


cS>3 


cj M 55 

D.2 





80 






~ ~ ~ 


3.2 


< 


3~ 1 U 




.0 




cj Mu * 


j2 


X) M 






1-H 




CO 










00 




d 



lllillHlh 




180 BUSINESS STATISTICS 

total employment is placed at the top of the sheet but in the re- 
mainder of the form the staff is divided into "General Employees" 
and "No Lost or Overtime Employees." The reasons for lost time at 
the left give a detailed view of what caused employees to be absent 
from work and the length of absence. A summary of actual hours 
worked, hourly rate of earnings, and payroll is included for each of 
the two types of employment. 

The features of this form are the spacing, variation in type, judicious 
use of ruling, and the inclusion of a transfer code. 

Figure 30. A person unfamiliar with the organization and opera- 
tions of the Eastman Kodak Company would experience considerable 
difficulty in filling out the form of Figure 29 because of the technical 
use of terms and the need for explanation of the method of computing 
averages and other summary figures. But variations in usage would 
also occur among those familiar with the form, if uniform interpreta- 
tions of terms and computations were not provided, hence the instruc- 
tions for filling out the form are printed on the back. These have 
been reproduced as Figure 30. 

The explanations are written for the guidance of persons thoroughly 
familiar with the way the company is organized and operated; there- 
fore much is omitted that would have to be stated in the instructions 
accompanying a similar form used in external statistical work. For 
example, the definition of reportable employees appears to be ambig- 
uous in stating that "General Employees" excludes "No Lost or 
Overtime Employees" while "No Lost or Overtime Employees" ex- 
cludes "General Employees." However, the persons who use this form 
understand that "General Employees" are those who work on a piece 
rate or hourly rate basis, and "No Lost or Overtime Employees" are 
those who work on a fixed weekly wage basis. Hence the instructions 
concerning reportable employees mean that regardless of the type of 
work performed by a particular individual during a given payroll 
period he is to be reported according to his permanent status either as 
a time worker or a salaried worker. 

Figure 31. The Labor Turnover Report like the Lost-Time Report 
is filled out for each works and each four-week period. The form is 
largely self-explanatory although it calls for considerable detail. The 
report requires information on three subjects: (1) the number em- 
ployed, (2) the number entering, and (3) the number leaving, but 
the emphasis is placed on an analysis of the exits. The primary sum- 



TABULATION 



181 



FIGURE 31 
LABOR TURNOVER REPORT 



H 

tf 
O 



M > 

20 






















cupauonal cause 
e (Femalel 





3. 




Is 1H 

5-S S-S SS-S I? * "-2 

3S giSfill^ll 3fi.sc 

3 Q .^S H 

ifS'<-)'i<OO^OOOO-H Mf^r^^iio fN ro -f ^H f>| r*i ^< 10 

^ vH ^,^,^i r 4^ l ^^i ( s4(si()or>t^o^<<^i^ l -<}iiomioinv> 

4 f) ^ 10 



a! 

o 

CO 



X 





































































































































































































































T 












ta 


Number of Employees 

First Dav of Period . 


r 

"o 

'I 


I 
a. 

J 

'I 
\ 


Entrances 

FmnlnvfH 


: l 


] 

1 

h 


i 
u 

i 


jl 

i! 


c 

i 


j 


1 


T 
1 

1 


t 

! 

$ 


i 

I 


Deduct "Transferred ' an 
"Unavoidable" 




B2 A 

IN O 



182 BUSINESS STATISTICS 

mary figure is the "Net Turnover Per Cent" obtained by dividing "Net 
Number Leaving" by the "Average Number of Employees for the 
Period." 

The percentage distribution of "Total Exits" according to length 
of service provides information concerning the "employee plant age" 
at which employment is most likely to be terminated. 

The analysis of reasons for leaving, on the right-hand side of the 
form, is intended to provide detailed information concerning the under- 
lying causes of labor turnover. Continuous study of these reports aids 
management in two ways: (l) the personnel manager secures a back- 
ground of knowledge of what types of persons are most likely to 
become permanent employees and (2) over-all management is able 
to detect unsatisfactory conditions that are causing a large separation 
ratio in any part of the organization. 

Conclusion 

The forms appearing in this section are not intended to be repre- 
sentative of all the prepared tables used in routine statistical work. 
The purpose is merely to present a few examples to show how the 
principles of tabulation are employed in practical work. The outstand- 
ing feature of all of the forms is the extent to which arrangement, 
ruling, spacing, and content have been co-ordinated to emphasize the 
major results contained. 

These forms, of course, are not intended for publication but are 
prepared for use by persons thoroughly familiar with their contents 
and purposes. Consequently many things which have required explana- 
tion in presenting the forms in this book are commonplace to those 
who use the forms regularly. This difference in background leads to 
a general observation of some importance to the budding statistician. 
Study of the principles and methods of statistics provides a sound basis 
for engaging in work of the type involved in preparing forms such as 
these examples, but general knowledge must be supplemented by par- 
ticular training to produce a practicing statistician. 

PROBLEMS 

1. What type of classification is employed in each of the following: 



6 The use of a percentage distribution of separations in the measurement of labor 
turnover is developed in chapter XII. 



TABULATION 



183 



COLOR 

OF 

HAIR 


NUMBKK OF 
STUDENTS 
IN CLASS 


Light 


8 


Red 


3 


Brown 


7 


Black 


2 



CITY 


BANK DEBITS 
1938 
(Millions of Dollars) 


Boston . . 


14,288 


New York 


168 778 


Philadelphia 


14,553 


etc. 





SIZE OF 
CITY 


No. OF DWELLING 
UNITS CONSTRUCT 
PER 10,000 POPULATION 
IK 1940 


500,000 and over. . . 
100,000 to 500,000. 
50,000 to 100,000. 
25,000 to 50,000. 
10,000 to 25,000. 
5,000 to 10,000. 
2,500 to 5,000. 
All urban 


486 
56.8 
485 
67.4 
68.7 
67.3 
64.4 
57 5 









SHIPMENTS OF 




FINISHED STEEL BY 


YEAR 


UNITED STATES STEFL 




CORPORATION 




(1,000 net tons) 


1937 


14,098 


1938 


7 316 


1939 


11 707 


1940 


14976 







2. From recent issues of any business periodical find three one-way tables, 
each of which illustrates a classification according to a different kind of 
characteristic. Copy enough of the table to indicate the kind of classifica- 
tion. Give exact and complete references to the sources used. 

3. What are the distinguishing characteristics of primary tables and derivative 
tables? 

4. Which of the tables of problem 1 are primary and which are derivative? 

5. In Table 17, page 153, what information is primary and what is derivative? 

6. The following statistics have been published for the Bell Telephone System: 

"The number of manual service telephones declined from 10,705,118 at 
the end of 1930 to 9,659,349 at the end of 1931 while the dial service tele- 
phones increased in the same period from 4,976,941 to 5,730,645. This is 
a net decline in number of telephones of 292,065. The average number of 
telephone calls per day in 1930 was 62,365,000 local and 2,933,000 toll; 
in 1931 these had declined to 62,205,000 and 2,700,000 respectively, a 
total decline of 393,000 calls. The miles of wire in underground cable 
were 50,225,000 at the end of 1930, the miles of aerial cable were 20,- 
785,000. At the end of 1931 there were 52,214,000 miles of underground 
cable and 21,951,000 miles of aerial cable. There were 5,238,000 miles of 
open wire at the end of 1930 and 5,074,000 miles a year later." 

Present this information in tabular form, taking account of all the points of 
established practice in table construction. Does your table have unity? 
Why or why not? What is the degree of complexity? Explain. 



184 



BUSINESS STATISTICS 



7. List the separate classifications present in Figure 24, page 159, and state 
the characteristic with respect to which each classification is arranged. 

8. Study the headings of tables in any year's Supplement to the Survey of Cur- 
rent Business. Write a paragraph on the use of table headings based on 
your findings. Give specific references. 

9. a) Consult Table 2, page 1, in any Statistical Abstract from 1932 to 1940. 

(1) Discuss the location of totals. (2) Discuss the arrangement of 
stub items. 

b) In the Statistical Abstract consult either Table 494, page 432, 1939; 
Table 472, page 408, 1938; Table 464, page 401, 1937; or Table 460, 
page 401, 1936. 

(1) Describe the method of classification and degree of complexity. 

(2) How many different units are there? Name them. Do you think 
it is justifiable to include all of them in one table? Why or why 
not? 

(3) Discuss any desirable or undesirable features in the table 

10. Describe in detail the order of arrangement of each of the following tables: 



DEATHS FROM CHIEF CAUSES IN 
THE UNITED STATES, 1935 



DISEASE 


No. OP 
DEATHS 


Heart 


312,333 


Cancer 


144,065 


Nephritis 


103,516 


Pneumonia 


100,279 


Accidents 


99,967 


etc. 





B 
BANK CENSUS, 1935 









TOTAL 


STATE 


No. OF 


No. OF 






BANKS 


EMPLOYEES 


WAGES 


Alabama . . 


251 


2,123 


$ 3,227,296 


Arizona . . 


39 


492 


848,587 


Arkansas . 


260 


1,416 


1,905,105 


California . 


1,083 


19,523 


38,675,923 


etc. 









RETAIL TRADE IN THE UNITED STATES, 1935 
APPAREL GROUP 



TYPE OF 
MERCHANDISE 



Men's furnishings 

Men's clothing 

Family clothing 

Women's ready-to-wear 

Furriers and fur shops 

Millinery stores 

Custom tailors 

Accessories and other apparel. 
Shoe stores 



SALES 

(in millions) 



$516 

144 

359 

795 

60 

94 

67 

110 

511 



D 

VALUE OF PUBLIC BUILDINGS ERECTED IN 
CITIES IN NEW YORK STATE IN 1936 



CITY 


VALUE OF THE 

CONSTRUCTION 
(in thousands) 


Buffalo 


$ 21 


Rochester 


1,491 


Syracuse 


1,108 


Yonkcrs 


60 


Albany . 


95 


Utica 


17 


etc. 





TABULATION 185 

REFERENCES 

Bureau of Agricultural Economics, United States Department of Agriculture, 
The Preparation of Statistical Tables, A Handbook. Washington, D. C, 
December, 1937. 

A statement of the rules for table construction employed by a government 

bureau. 

DAY, EDMUND E., "Standardization of the Construction of Statistical Tables," 
Quarterly Publications of the American Statistical Association, Vol. XVII, 
No. 129 (March, 1920), pp. 59-66. 

A brief but complete statement of the principles of tabulation. 

DAY, EDMUND E., Statistical Analysis. New York: The Macmillan Co., 1925. 
Chapters IV and V contain a detailed discussion of classification of ob- 
servations. 

MUDGETT, BRUCE D., Statistical Tables and Graphs. Boston: Houghton Mifflin 
Co., 1930. 

Chapters I, II, and III contain a very lucid explanation of the principles of 
classification and tabulation. 

WALKER, HELEN M., and DUROST, WALTER N., Statistical Tables, Their Struc- 
ture and Use. New York: Bureau of Publications, Teachers College, Colum- 
bia University, 1936. 

A detailed discussion of the mechanics of table construction and the analysis 
of tabular material (examples from field of education). 



CHAPTER IX 
CLASSIFICATION OF LIBRARY SOURCES 

THE MEANING OF COLLECTION FROM LIBRARY SOURCES 

CHAPTERS IV to VIII have been devoted to the methods of 
securing data by direct investigation. Chapters IX and X will 
explain the procedures used in finding data that have already 
been collected and published. The discussion is introduced at this 
point because subsequent chapters deal with the steps of analysis which 
are applicable to data collected either directly or from library sources. 

"Library" is used as a general term descriptive of all published 
sources of business data. Some of the publications which are available 
to students only in public or school libraries may be kept on file cur- 
rently by an individual business concern, but the industrial statistician 
is also likely to be dependent upon libraries for long-time series or 
other than ordinary data. His method of procedure will not differ 
materially from that of the student in the search for published data 
needed in a given problem. 

Published sources of business data are for the most part current 
periodicals or yearbooks. How to become familiar with the contents 
of these publications presents a very real problem. A reference list of 
such sources and of their contents is of only temporary value since 
they are subject to constant changes. New publications may appear 
and older ones disappear; new series are added, older series are dis- 
continued, and the form of recording is altered. Consequently, in this 
chapter the emphasis is placed on various classifications of library 
sources, but no attempt is made to provide a complete list 1 of reference 
material. The next chapter will deal with the difficulties which may 
be encountered in securing data from these publications. 

METHODS OF CLASSIFYING SOURCES 

Published sources of data may be classified in a number of different 
ways according to the point of view from which a problem is ap- 

1 A selected list of sources is given in Appendix A at the end of the chapter. These 
sources are numbered consecutively and whenever one of them is mentioned in the text 
reference is made by number to the detailed description in the appendix. 

186 



CLASSIFICATION OF LIBRARY SOURCES 187 

preached. Five methods of classification are discussed in the succeeding 
pages: (1) types of data contained, (2) form of publication, (3) 
frequency of publication, (4) regularity of publication, and (5) pub- 
lishing agency. These are not all equally important but all of them 
must be taken into account in acquiring familiarity with sources. 

Types of Data Contained 

Classification according to type of data is the least important method 
from a practical point of view, since it is applicable to only a few 
sources. All source books of business statistics are concerned with 
economic data, but as a rule they are not confined to a single phase 
such as production of raw materials, manufacturing, or marketing. 
Agricultural Statistics? the Census of Manufactures* and the Market 
Data Handbooks* might be named as representative of these three 
specific phases of the economic structure. A few other sources of 
limited scope are: Statistics of Railways* (transportation) and Chain 
Store Age 6 (a specialized type of retail trade). By far the greater 
number of source books deal with several or all of the functions in 
our economic system. Examples are: the Survey of Current Business? 
Standard Trade and Securities Statistical Bulletin* and the Commer- 
cial and Financial Chronicle. 9 

Some of the source books may be classified according to their scope 
in other respects, for instance, geographically. Many of them contain 
data for the entire United States; others are confined to a particular 
state, city, or local area. Still others present world data or data for 
several different countries. Frequently a single source will cover ter- 
ritorial or governmental subdivisions at various levels. For example, 
Agricultural Statistics 2 is mainly devoted to a complete compilation of 
data concerning agriculture in the United States with some information 
for other countries. In many tables, however, production statistics are 
subdivided by states, and price and market data by individual cities 
or regions. This source then is international, national, state, and local 
in scope, with the major emphasis on the national data. 

2 Appendix A, No. 23. 
8 Appendix A, No. 12. 

4 Appendix A, No. 6. 

5 Appendix A, No. 42. 

6 Appendix A, No. 72. 

7 Appendix A, No. 1. 
Appendix A, No. 59. 

9 Appendix A. No. 54. 



188 BUSINESS STATISTICS 

It can be concluded, therefore, that the majority of sources cover 
so wide a range of information that they cannot be classified according 
to specific types of data. This method of classification of source mate- 
rial, although theoretically sound, is not usable in the search for data. 

Form of Publication 

Statistical Source Books. Some publications consist almost entirely 
of statistical tables. Either the index or the table of contents can be 
used to find the data pertaining to a particular subject. Most publica- 
tions of this kind come from governmental agencies, although in recent 
years there has been a great increase in the amount of such work done 
by private organizations. Examples of the latter are the Standard 
Trade and Securities Statistical Bulletin, Automobile Facts and 
Figures, 11 and A Review of Railway Operations. 

Auxiliary Sources.- Sources which contain data as an auxiliary to 
other functions are more difficult to use. Data may be scattered 
through the book or magazine in conjunction with articles to which 
they apply. This is the case with the Commercial and Financial 
Chronicle, 1 * Business Week, 14 and Commerce Reports. Other auxil- 
iary sources such as Dun's Review and the Northwestern Miller 11 
group most of their data in one section. 

In the great majority of such publications the data appear in tabular 
form. The tables have proper titles and whatever footnotes are neces- 
sary to explain any irregularities of the information. There are, 
however, a few cases in which valuable data are printed as text mate- 
rial. Careful attention is necessary in order to detect data published 
in this form and caution must be exercised in using them because 
necessary explanations may be far removed from the place in the text 
at which the data are found. It would be advantageous to all con- 
cerned if this practice were discontinued, but as long as it persists 
statisticians must be prepared to search for information appearing in 
that form. Newspapers avoid the use of tables with some regularity 

10 Appendix A, No. 59. 

11 Appendix A, No. 50. 

12 Published annually by the Association of American Railroads, Bureau of Railway 
Fconomics, Washington, D. C. 

18 Appendix A, No. 54. 
14 Appendix A, No. 55. 

18 Published weekly by the United States Department of Commerce, Bureau of Foreign 
and Domestic Commerce. 
"Appendix A, No. 57. 
17 Appendix A, No. 66. 



CLASSIFICATION OF LIBRARY SOURCES 189 

because of the difficulty of adjusting them to narrow columns. The 
following paragraph illustrates the need for a table. 

Then, France is a country of handicraftsmen ; even after recent and important 
evolutions, such as the reconstruction of the northern departments and the return 
of Alsace-Lorraine, it has not decidedly become a country of great industry. 
Out of 21,721,000 people given to active occupations, only 6,181,000 28% 
belong to the industries of transformation. Among those 6,181,000 only 4,027,- 
000 65% are regular industrial workers; 683,000 11% are employers, 
which shows the great number in France of small employers; 1,162,000 
18% work alone independently or are not regularly connected with employers. 
Out of 4,000,000 of regular workingmen, only 774,000 19% are employed 
in factories of more than 500 workers! The conclusion is that France is a 
country of artisans; the village joiner, the motor-car mechanic, the couturiere 
even in the great maison de couture, the vine grower, the gardener who raises 
vegetables or fruits, all belong to that type of workers, and they are undoubtedly 
the most typical of the French people. 18 

Variability of Content. There is a variation in the form of pub- 
lishing data which applies to sources devoted entirely to statistical 
information as well as to those that publish such information inci- 
dentally. The most convenient publications to use are those which 
contain the same series of data in each issue, such as the monthly 
Survey of Current Business, the Monthly Labor Review, and the 
Standard Trade and Securities Statistical Bulletin. 21 On the other hand 
Crops and Markets 2 * and Steel 23 present whatever monthly or weekly 
data are available at the time of publication. 

Frequency of Publication 

In discussing this classification we will proceed from those sources 
which appear most frequently to those which have longer intervals 
between publication dates. 

Daily. Daily papers then are the first source on the list. The 
financial section of the paper contains information on a variety of sub- 
jects. The great virtue of the daily paper is its ability to place data 
in the hands of its readers quickly. The element of speed tends to 

18 Andre Siegfried, "French Industry and Mass Production," Harvard Business R* 
view, Vol. VI, No. 1 (October, 1927), p. 2. 
"Appendix A, No. 1. 

20 Appendix A, No. 16. 

21 Appendix A, No. 59. 

22 Appendix A, No. 24. 

23 Appendix A, No. 62. 



190 BUSINESS STATISTICS 

reduce accuracy; hence the data found in daily papers are sometimes 
not reliable and for this reason they should be verified in other sources 
as soon as possible. 

There are a number of daily publications which are valuable sources 
because they deal with particular subjects. Among these the Wall 
Street Journal?* the New York Journal of Commerce and the Amer- 
ican Metal Marked are typical. There are also many daily reports 
issued by government agencies, such as the daily Treasury Statement? 1 
and daily produce market reports issued by state departments of 
agriculture. 

Weekly. There is an increasing tendency toward weekly compila- 
tion and publication of data to meet the demands of business men for 
information as nearly current as possible. The demand is further evi- 
dence of the extent to which numerical facts have become useful in 
determining business policy. Accordingly, the Bureau of Labor Statis- 
tics now computes its Index of Wholesale Prices weekly. Data such 
as car loadings, bank debits, and electric power production are available 
weekly. Among weekly publications the Commercial and Financial 
Chronicle?* the Weekly Supplement to the Survey of Current Busi- 
ness 29 and Iron Age* are widely used. 

Monthly. Monthly publications also attempt to put information in 
the hands of users as soon as possible. Sometimes the data for one 
month are available as early as the 10th of the following month. More 
commonly the data are one or even two months old before they appear 
in print. Some important monthly publications are the Federal Reserve 
Bulletin* 1 the Survey of Current Business?* and bank reports such as 
the Business Bulletin of the Cleveland Trust Company. 82 

Annually. Other valuable sources appear annually. Most impor- 
tant of these are the yearbooks which contain a great amount of basic 
data with some series running back for long periods. Yearbooks 
require much preparation, consequently the data may be several months 
or even a year old before the book is published. Among the valuable 



14 Appendix A, No. 58. 

35 Journal of Commerce Corporation, New York. 

26 American Metal Market Company, New York. 

27 United States Treasury Department, published in daily newspapers. 
"Appendix A, No. 54. 

29 Appendix A, No. 1. 

80 Appendix A, No. 61. 

31 Appendix A, No. 34. 

M Cleveland Trust Company, Cleveland, Ohio. 



CLASSIFICATION OF LIBRARY SOURCES 191 

yearbooks the Statistical Abstract* and Agricultural Statistic^ may be 
mentioned. Several newspapers in various parts of the country publish 
yearly almanacs which contain a large amount of statistical data. 
These are not usually considered to be authoritative source books 
but they serve as convenient guides to data which may be found 
elsewhere. 

Longer Intervals. Examples of sources appearing less frequently 
are the volumes of the Census of Population which are published at 
ten-year intervals, the Census of Agriculture** which has been pub- 
lished along with the Census of Population since I860 and quinquen- 
nially since 1925, and the Census of Manufactures* 1 which has been 
published along with the Census of Population since 1850, was 
also published in 1905 and 1914 and has appeared biennially since 
1919. 

Special Releases. Recognizing the necessity of saving time, many 
of the government bureaus release their more important data as soon 
as possible. In some instances the data contained in these releases may 
be preliminary and may subsequently be revised in a regular publica- 
tion. In other cases the data are not reprinted at any time. The Bureau 
of Mines has adopted the practice of issuing each chapter of its Year- 
book** separately in advance of the complete bound volume. The 
Bureau of Labor Statistics distributes special processed bulletins on 
wholesale and retail prices, cost of living, and employment and pay- 
rolls, but the greater part of this material is reproduced in subsequent 
issues of the Monthly Labor Review. On the other hand the Bureau 
of Census releases information concerning the Census of Manufac- 
tures in both processed and printed form, much of whkrh is never 
reprinted in the bound volumes. The Bureau of Public Roads of the 
Department of Agriculture distributes various printed releases con- 
cerning state and federal gasoline taxes and automobile registrations 
and license fees. Some of this information is not reprinted in bound 
volumes. Knowledge of these various releases must be acquired by 
experience since they are not always included in check lists of gov- 
ernment publications. 

88 Appendix A, No. 8. 

84 Appendix A, No. 23. 

85 Appendix A, No. 10. 

86 Appendix A, No, 11. 
8T Appendix A, No. 12. 
M Appendix A, No. 23. 



192 BUSINESS STATISTICS 

Regularity of Publication 

Some published sources appear at regular intervals; others appeal 
irregularly. 

Regular. The question of regularity is important from several 
points of view. Business men have learned to expect weekly publica- 
tions to be delivered in a certain mail. They delay decisions until the 
arrival of the latest statistical report. Absolute regularity of publication 
is required to meet this demand. Statisticians depend upon regular 
publications for current data in carrying on their research work. The 
business community in general expects to receive regular information 
from weekly or monthly publications. In other cases no exact date of 
publication is observed but publication is certain each week, or month, 
or quarter, or year. Regularity is, of course, a great virtue in source 
material and the great majority of publications possess it. 

Irregular. There are other publications which appear irregularly 
although the data which they contain are collected quite regularly. 
The Census of Manufactures contains data that are collected biennially 
but are published whenever the Department of Commerce is able to 
prepare the data and funds are available to meet the cost. Only those 
irregular publications that are already in print can be included in the 
plan of an investigation. On the other hand regular publications which 
are scheduled to appear while an investigation is in progress can be 
included. An example will perhaps clarify this distinction. As this is 
being written parts of the results of the 1940 Census of Population 
have been released. It would not be possible, however, to plan an 
investigation requiring the use of the complete Census or any unpub- 
lished part of it because there is no way of knowing when the needed 
data will be published. 

Special Studies. While the most valuable sources are those which 
are published at regular intervals and which consequently keep the 
information up-to-date, there are many special studies that appear from 
time to time containing data which are available in no other sources. 
These studies are reports of special researches. Usually they are 
models of the application of statistical methods in practical work. 
They are consequently valuable references for data, methods of 
analysis, and form of presentation. Examples of this type of work are 
the Cost of Living Studies of the Department of Labor made in 1918 
and in 1936. 80 

89 Appendix A, No. 22. 



CLASSIFICATION OF LIBRARY SOURCES 193 

There are also important non-government publications of this type 
among which the Retail Clothing Survey made by Northwestern Uni- 
versity 40 and the Study of Income in the United States, 1909-19, by 
the National Bureau of Economic Research 41 are excellent examples. 

Publishing Agency 

This classification of sources has probably the greatest practical 
value in library research. The publications are usually catalogued on 
this basis in the libraries. In discussing this classification it will be 
well to recall that only those sources that are related to the field of 
business are included, no attempt being made to include sources of 
statistical data in other fields. 

United States Government. The most important publishing agency 
is the federal government, chiefly through the executive branch which 
includes all the departments and independent offices. 42 An outline of 
its plan of organization is given in Figure 32. Some of the departments 
and certain bureaus and offices in particular produce a tremendous 
amount of statistical data, whereas others by the very nature of their 
work produce none. In searching for data, one quickly becomes 
familiar with the Bureau of Foreign and Domestic Commerce and the 
Bureau of the Census in the Department of Commerce; the Bureau 
of Labor Statistics in the Department of Labor; the Bureau of Agri- 
cultural Economics in the Department of Agriculture; the Bureau of 
Internal Revenue in the Department of the Treasury; the Bureau of 
Mines in the Department of the Interior; the Interstate Commerce 
Commission; the Federal Trade Commission; and the Board of Gov- 
ernors of the Federal Reserve System. 

The list of sources given in Appendix A at the end of the chapter 
is arranged according to publishing agency and includes the most im- 
portant statistical publications of these bureaus and other offices. 

In addition to the regular government publications which are usually 
issued as periodicals or yearbooks, there are often useful data in the 
annual reports of department, bureau, and division heads. Several of 

40 Costs, Merchandising Practices, Advertising and Sales in the Retail Distribution of 
Clothing, Vol I- VI, 1921. Selling E\penses and Their Control in the Retail Dntrihutwv 
of Clothing, Vol VII, 1922. New York: Prentice-Hall, Inc. 

41 Volumes I and II of the Publications of the National Bureau of Economic Re- 
search. New Yoik 

**The United States Government Manual, published three times yearly by the Office of 
Government Reports, Washington, D. C, gives complete and up-to-date information on the 
organization and activity of all subdivisions of the federal government. The outline in 
Figure 32 is reproduced from this source. 



194 BUSINESS STATISTICS 

these are listed in Appendix A. Another is the Annual Report of the 
Commissioner of Immigration (Department of Labor), which gives 
data concerning immigrants and emigrants in more detail than can be 
found in any other publication. Some annual reports are available 
only as numbered documents of the Congress to which they were sub- 
mitted, but others are published separately. 

Finally, the special investigations made for congressional committees 
should be mentioned. These are usually detailed studies of a particular 
subject and as such are unique sources. They are likewise usually pub- 
lished as Congressional Documents. Excellent examples are the Marine 
Insurance Investigation of 1920 43 and the Chain Store Investigation of 
1929-33. 44 

State and Municipal Government. In many cases the most prolific 
sources for information concerning the individual states are the publi- 
cations of the federal government that have already been mentioned. 
In addition there are many publications by the state governments them- 
selves. The latter vary so much from state to state that an attempt 
to list them would not be feasible. Some states are far in advance 
of others in furnishing statistical information to their citizens. A few 
have begun the publication of yearbooks similar in plan to the Statis- 
tical Abstract of the United States. A list of sources of market data 
available for the various states is given in Market Research Sources** 
This list can be supplemented by consulting the library card catalog 
under the individual states. Information is likely to be found in the 
publications of the state departments of Agriculture, Banking and 
Insurance, Labor, and Highways. The publications of the Land Grant 
Colleges and other state institutions also contain valuable special data. 

Very few source books are published by municipalities, but con- 
siderable local information is available in federal and state publica- 
tions. For example, the Census of Manufactures 4 * includes data for 
individual cities; likewise the Monthly Labor Revieiv* 1 gives a retail 
food price index monthly for 51 cities; and the Industrial Bulletin** 



43 S. S. Huebner, Report of Status of Marine Insurance in the United States, and Re- 
port on Legislative Obstructions to the Development of Marine Insurance in the United 
States. 

44 Investigation for the Senate Committee on the Judiciary by the Federal Trade Com- 
mission Published in several parts in 1933 as Numbered Documents of the 72nd Congress 

45 Appendix A, No. 7. 
"Appendix A, No. 12. 

47 Appendix A, No. 16. 

48 New York State Department of Labor. 



CLASSIFICATION OF LIBRARY SOURCES 195 

provides monthly data on employment and payrolls for various cities 
in New York State. 

Non-Government. State and local information is also available 
through the publications of certain semi-public organizations. The 
best examples are the monthly reviews of business issued by the 12 
Federal Reserve Banks 49 and reports of statistical studies by the 
research bureaus of universities. 

In addition there are many private agencies which publish statis- 
tical data for general use. In a majority of cases the data are collected 
for the use of an interested group such as the members of a trade 
association or the subscribers to a service, but are made generally 
available through magazines and trade papers. There are other cases 
in which data are collected and published merely to increase the value 
of a magazine to the reading public. In any event the cost of making 
the data available must be borne by the subscribers to the publication. 
Historically, private agencies preceded the government in supplying 
current data to the public. Among the pioneers in this field were 
Dun's Review, Bradstreefs Review, The Commercial and Financial 
Chronicle, and Babson's Service. 

The private agencies compiling and publishing statistical informa- 
tion may be classified as follows: trade, industrial, and financial asso- 
ciations; financial magazines; statistical services; and trade and 
industrial magazines. The outline of sources used in Appendix A 
conforms to this one. The examples given there are some of the most 
commonly used non-government sources. The list has purposely been 
confined to only a few of the multitude of publications which might 
have been included. Many of those omitted contain data of value; 
hence the need for gradually expanding one's knowledge of them as 
progress is made in the use of source material. 

Foreign and International. While the agencies already named 
provide most of the data needed for statistical work, there are occa- 
sions which call for the use of information from foreign countries or 
for world data. A list of sources published in foreign countries as well 
as publications containing world data can be found in The Economists' 
Handbook A Manual of Statistical Sources. 

40 Appendix A, No. 36. These are actually private organizations but because of theii 
close integration with the Federal Reserve system their bulletins have been listed as gov- 
ernment publications. 

so Verwey and Renooiz, Amsterdam. 1934. 



196 BUSINESS STATISTICS 

Summary 

The use of published sources of business data requires a knowledge 
of government and non-government publishing agencies and the titles 
of their publications, as well as a knowledge of the form, frequency, 
and regularity of publication and the type of data contained. The 
classification according to publishing agency is the most generally 
usable one; hence it has been employed in Appendix A. 

It is necessary to remember, however, that source material is con- 
tinually changing with respect to all of these classifications. That is, 
the type of data may be changed by the addition of new series and 
the elimination of old ones, or by changing the titles of series and 
the data included in them. Likewise the form of publication may 
change. Data formerly scattered throughout the publication may be 
brought together in a statistical appendix or published separately. On 
the other hand statistical compendiums may be abandoned. Again the 
frequency of publication changes when a weekly publication becomes 
monthly or the reverse; when an annual is supplemented by a monthly 
and/or weekly or when a weekly or monthly issue begins publication 
of an annual supplement. The regularity of publication also undergoes 
changes. Sources which have appeared regularly for years may be 
discontinued entirely 51 or subsequently may appear irregularly. Finally, 
changes occur in publishing agency and in titles of publications. For 
example, the material formerly found in Eradstreefs Review is now 
found in Dun's Review; the Bureau of Mines of the Department of 
Interior was for several years in the Department of Commerce; the 
former Commerce Yearbook, Volume II, Foreign Commerce is now 
the Foreign Commerce Yearbook. 

In the light of these changing conditions it may readily be under- 
stood that any list such as that given in Appendix A loses its accuracy 
after a few years. It is therefore necessary for users of published 
source material to keep abreast of current changes as they occur. 
The list given in the appendix provides an adequate nucleus which 
can be kept up-to-date by noting additions and changes from time to 
time. 



81 One of the most useful references, The Annalist, heretofore published weekly by 
the New York Times Co., was discontinued in October, 1940. Although many of the 
series of data carried by this publication are no longer available currently, the volumes 
for earlier years contain valuable data. 



CLASSIFICATION OF LIBRARY SOURCES 197 

APPENDIX A 

SELECTED SOURCES LISTED ACCORDING TO PUBLISHING AGENCY, TITLE, 

FREQUENCY OF PUBLICATION, AND CONTENTS 

(REVISED TO OCTOBER, 1940) 

UNITED STATES GOVERNMENT SOURCES 
Department of Commerce Bureau of Foreign and Domestic Commerce 

1. Survey of Current Business (monthly and weekly, with occasional yearly 
supplements) 

The monthly issues present data for the United States concerning 
business indexes, commodity prices, construction and real estate, domestic 
and foreign trade, employment, finance, transportation and communication, 
and statistics of industry in 12 general subdivisions; also some Canadian 
data. Brief summary statements and tables are given concerning each, 
followed by monthly statistics that cover the preceding 13 months. 

The weekly pamphlet brings some of the monthly series up-to-date in 
advance of the monthly issue, and gives weekly data for a few important 
series. 

The supplements to date have been issued for the years 1931, 1932, 
1936, 1938, and 1940. Each covers a number of years, and taken together 
(except for subsequent revisions) they furnish continuous monthly data, 
including monthly averages for every year since each series has become 
available. Full notes are appended explaining the sources and methods of 
construction of each series. 

2. Monthly Summary of Foreign Commerce of the United States (monthly) 

Gives dollar value and quantity of all goods exported from and 
imported into the United Staes including gold and silver. Includes a 
detailed classification of both exports and imports by articles and by 
customs districts. Since each month's issue contains data for that month 
and the calendar year to date the December issue contains the total for 
the year. 

3. Domestic Commerce (weekly) 

Presents digests of important studies both government and non- 
government, summaries of federal bills, laws, and court decisions, also 
statements of recent publications by the various government agencies, a list 
of recent publications dealing with domestic commerce, and some data 
regarding changes in wholesale and retail trade along specific lines. 

4. Foreign Commerce Yearbook (annually) 

Purpose is "to provide in a single volume all the important basic 
statistical material essential for a comprehension of current economic 
developments in foreign countries." Part I gives data for each country 
separately, total trade and trade with the United States being given by 



198 BUSINESS STATISTICS 

specific commodities; Part II gives comparative world statistics on 
population, production in agriculture and industry, and trade. 

5. Foreign Commerce and Navigation of the United States (annually) 

Detailed tables of specific items of export and import by countries; 
also number and tonnage and ports of arrival and clearance of American 
and foreign vessels. 

6. Consumer Market Data Handbook, 1939 (intervals of 3 or 4 years) 

A nation-wide survey of the markets for consumer goods presenting 
"on a comparable basis for all counties, and in most cases for all urban 
communities, just how much consumers spent in retail stores, and in 
service, amusement and hotel establishments; what wholesale business 
amounted to; how many of the consumers' homes had telephones and 
electric meters; how many persons made out an income tax return; how 
many passenger automobiles were registered ; what the relief load amounted 
to; and other factors indicative of the relative importance of each market." 

An Industrial Market Data Handbook was also published in 1939, 
giving data regarding the industries in every county in the United States. 

7. Market Research Sources (biennially) 

Complete and thoroughly cross-referenced lists of all government and 
non-government sources relating to problems of domestic marketing. 

Department of Commerce Bureau of Census 

8. Statistical Abstract of the United States (annually) Prior to 1938 pub- 
lished by Bureau of Foreign and Domestic Commerce. 

"A digest of data collected by all statistical agencies of the national 
government," as well as by some states and private agencies. It consists 
entirely of summary tables and time series, chiefly annual data, with notes 
defining the scope and terms used in each, and referring to the original 
sources. 

9. Abstract of the Census (decennially) 

A selection of the most essential statistics collected on all subjects at 
each census, in one volume. Data are given by subjects, states, and cities, 
and some by counties and smaller civil subdivisions. Some data are included 
covering outlying territories and possessions of the United States. 

10. Census of Population (decennially) 

Data are classified by states, counties, cities or villages, and minor 
civil subdivisions. The subjects included are: color, race, nativity, parentage, 
origin of foreign born, sex, marital condition, age, urban and rural 
distribution, citizenship, school attendance, and literacy. Separate volumes 
give data on occupations and families, and sometimes unemployment. 
Reports on special groups or subjects are published at irregular intervals 



CLASSIFICATION OF LIBRARY SOURCES 199 

in intercensal years, such as Religious Bodies, Benevolent Institutions, The 
Blind and the Deaf, Negroes in the United States, etc. A series of 
Mortality Statistics is published annually. 

11. Census of Agriculture (quinquennially) 

The year which coincides with a decennial census affords somewhat 
more detailed information than the intervening non-census year. In both, 
data are given by states and counties, for the number of farms, color and 
tenure of farm operator, uses of farm land, value of land and buildings, 
acreage, production and value of specified crops, and value of livestock by 
principal classes and age groups. In the decennial years, classifications are 
made also by minor civil subdivisions, and special reports such as irrigation 
are included. 

12. Census of Manufactures (biennially) 

The reports for years which coincide with a decennial census are given 
in greater detail than those taken in the intervening non-census years. 
Mines and quarries have been covered only in the decennial years (see 1935 
report under Census of Distribution). All reports give data concerning 
number of establishments ; number of wage earners and salaried employees ; 
amount of wages and salaries paid; cost of materials, fuel, and power; 
value of products; and value added by manufacture. Classifications are by 
industry groups, by states, by cities, and in some years, notably 1929, 
by industrial areas. 

13. Census of Distribution, and Business Censuses 

( 1 ) Census of Distribution, 1929 

This first attempt to gather nation-wide business statistics was 
a part of the fifteenth census. It included retail trade, wholesale 
trade, distribution of manufacturers' sales, contract construction, and 
hotels. 

(2) Census of American Business, 1933 

This covered the same field as the 1929 Census of Distribution. 
with the addition of services and places of amusement. Both afford 
data on number of establishments, net sales or receipts, personnel, and 
payroll. They are classified by field and kind of business, and by 
states, with some basic data for cities and counties. 

(3) Census of Business, 1935 

This is more comprehensive than either of the preceding censuses. 
Subjects added are: transportation and warehousing, tourist camps, 
radio broadcasting and advertising agencies, banking and finance, 
insurance, mines and quarries. 

(4) Census of Business, 1939 (part of sixteenth census) 

Will be practically the same as 1935. 



200 BUSINESS STATISTICS 

14. Financial Statistics of Cities (annually) 

Shows the financial transactions of cities having a population of over 
100,000, including taxes, indebtedness, specified assets, government costs 
and receipts. 

15. Financial Statistics of State and Local Governments (published decennially 
several years subsequent to census) 

"Statistics relating to revenue receipts, governmental cost payments, 
public debt, and assessed valuations and tax levies, for all divisions 'of 
government." The classifications are by states, counties, cities, towns, 
villages, boroughs, school districts, townships, and other civil divisions. 
Financial Statistics of States is also published annually when funds permit, 
but there was none between 1931 and 1937. 

Department of Labor Bureau of Labor Statistics 

16. Monthly Labor Review (monthly) 

Contains brief reports and complete statistical data on all matters 
handled by the department. Each issue contains sections on industrial 
disputes, wages and hours of labor, employment and payrolls, wholesale 
and retail prices, and cost of living, and usually other sections on labor 
legislation, industrial accidents, etc. There are always a few special articles 
on timely subjects concerning labor, and a list of recent publications by 
the department. 

17. Wholesale Prices (monthly) 

Monthly index numbers of wholesale commodity prices by groups and 
subgroups. Each issue gives the group and subgroup indexes and detailed 
indexes for certain groups. Comparisons are given with the same month in 
previous years, and with foreign countries. The December issue gives 
indexes for the 12 months of the year for the entire series of more than 
800 commodities. 

18. Retail Prices (monthly) 

Index numbers of retail prices of food, coal, electricity, gas, and other 
consumers' goods. Food data are given every month; coal, electricity and 
gas at frequent intervals; and other commodities less frequently. 

19. Changes in the Cost of Living (quarterly) 

Index numbers of changes in the cost of living, divided into 6 groups: 
food; clothing; rent; fuel, electricity, and ice; house furnishings; and mis- 
cellaneous, for 33 cities. 

20. Employment and Payrolls (monthly) 

Index numbers of employment, payrolls, hours worked, and weekly 
earnings for all manufacturing and non-manufacturing industries. It in- 
cludes data for employment and payrolls in the regular executive depart- 
ments of the federal government and on emergency work. 



CLASSIFICATION OF LIBRARY SOURCES 201 

21. Labor Information Bulletin (monthly) 

Brief summary of labor conditions. Hours of work, wages, cost of 
living, employment and payrolls, wholesale prices, retail food prices, indus- 
trial production and trade, agriculture, and government employment and 
relief for the month. 

22. Numbered Bulletins (irregular, several each year) 

"Each bulletin contains matter devoted to one of a series of general 
subjects, those subjects being Wholesale Prices, Retail Prices and Cost of 
Living, Wages and Hours of Labor, Employment and Unemployment," 
and many other subjects of interest to labor, but not of a statistical nature. 
Bulletin No. 661 gives a selected list of these bulletins as of 1938, and the 
most recent ones are listed on the back cover of each Monthly Labor 
Review. In recent years the tendency appears to be not to issue these 
bulletins on subjects covered by the regular monthly pamphlets listed above, 
so that the majority of the current bulletins deal with Wages and Hours of 
Labor in specific industries. One of the most important of the series is 
No. 357, Cost of Living in the United States, published in 1924 and giving 
a complete statistical report of the first extensive cost-of -living study, made 
in 1918-19. The more recent study made in co-operation with the Works 
Progress Administration is being reported in a series of bulletins beginning 
in 1936 under the general title "Studies of Consumer Purchases." 

On June 30, 1939 a special pamphlet (unnumbered) entitled Publica- 
tions of the Department of Labor was issued. This contains a complete 
list of all the publications of the various bureaus of the Department of 
Labor since it was organized. Of particular value are the detailed descrip- 
tions of changes of form and content which were made during the entire 
period in the various series published by the department. 

Department of Agriculture 

23. Agricultural Statistics (annually) 

Prior to 1936 was included in the Yearbook of Agriculture. Presents 
summary tables of all statistical data appearing in periodicals of the depart- 
ment, in great detail, usually covering a series of years. These include data 
on all United States crops, livestock, poultry and dairy products; farm 
business; foreign trade in agricultural products; and some data on world 
production. 

24. Crops and Markets (monthly) 

Each issue gives complete detailed reports, estimates, and forecasts on 
United States crops and other farm products, the items included varying 
with the seasons. Sectional data are given, and comparison with preceding 
years. Prices, wages, labor supply, stockyards, and market reports, and 
some items of export and import are included. 



202 BUSINESS STATISTICS 

Department of the Interior Bureau of Mines 

25. Minerals Yearbook (annually) 

Gives a general survey of mineral production in the United States and 
the world, and separate chapters dealing with each metal and non-metal. 
It contains the most recent compilation of statistical data on the more im- 
portant minerals: coal, natural gas, petroleum, stone, gold, silver, copper, 
lead, and zinc, etc. 

26. Several weekly and monthly bulletins on production and distribution of 
anthracite coal, bituminous coal, and coke. 

War Department 

27. Report of the Chief of Engineers, U. S. Army, Part 2, Commercial Statistics 
(annually) 

A complete review of water-borne commerce of the United States both 
domestic and foreign, freight and passenger, subdivided by grand divisions, 
by ports, and by commodities. All data are annual for calendar years. 

Post Office Department 

28. Annual Report of the Postmaster General (annually) 

Contains detailed statistical analysis of all receipts and expenditures of 
the department, for the fiscal year ending June 30; the number of post 
offices and employees; mail carried by each type of carrier; and money 
order transactions. 

Treasury Department 

29. Annual Report of the Secretary of the Treasury on the State of the Finances 
(annually) 

Statistical data for the fiscal year ending June 30 on receipts, expendi- 
tures, deficit, public debt, and monetary developments. The "exhibits" on 
public debt contain statements of all outstanding obligations (bonds, treas- 
ury notes, treasury bills, treasury savings certificates, and currency) issued 
by the United States government. Also list of securities owned by United 
States government, and statement of assets and liabilities of government 
corporations and credit agencies of the United States. 

30. Annual Report of Comptroller of Currency (annually) 

Report for the fiscal year ending October 31 covering in great detail 
all banking operations in the United States and money in circulation. State- 
ments are included of Reconstruction Finance Corporation, Farm Credit 
Administration, Federal Home Loan Bank System, Federal Deposit Insur- 
ance Corporation, Pacific National Agricultural Credit Corporation, and 
United States Postal Savings System. 



CLASSIFICATION OF LIBRARY SOURCES 203 

31. Combined Statement of Receipts and Expenditures, Balances, etc., of the 
United States (annually) 

Very detailed statement of receipts and expenditures of each depart- 
ment and independent office. 

Treasury Department Bureau of Internal Revenue 

32. Statistics of Income (published annually, about 2 years late) 

Detailed data on income tax returns by individuals, partnerships, and 
corporations, estate tax returns, and gift tax returns for the United States 
and individual states. Data for counties, cities, and towns are available in 
separate mimeographed bulletins. Beginning with 1934 corporation tax 
returns are published separately as Part II. 

33. Annual Report of Commissioner of Internal Revenue (annually) 

Detailed statistical report for the fiscal year ending June 30 on all tax 
revenue of the United States, including income taxes and all other mis- 
cellaneous taxes. A few of the tables give monthly data and comparison 
with previous years. 

Board of Governors of Federal Reserve System 

34. Federal Reserve Bulletin (monthly) 

The only official statement by the Board concerning the operations of 
Federal Reserve banks and member banks. Monthly or weekly data are 
given concerning financial, industrial, and commercial statistics in the 
United States; international financial statistics; and several indexes con- 
structed by the Division of Research and Statistics of the Federal Reserve 
Board on industrial production, construction, employment and payrolls, 
freight car loadings, and department-store sales. Summaries and discussion 
of current financial events, legislation, etc., appear in each issue. 

35. Annual Report of the Board of Governors of the Federal Reserve System 
(annually) 

Data similar to those in the monthly issues, but given for a series of 
years, some dating back to 1914. 

36. Monthly Publications of Federal Reserve Districts (monthly) 

The Federal Reserve Bank of each of the 12 districts publishes a 
monthly bulletin summarizing business conditions in that district. 

Federal Home Loan Bank Board 

37. Federal Home Loan Bank Review (monthly) 

Contains data concerning housing and building conditions including 
building permits, building costs, mortgages, building and loan association 
activity, government housing activity. It includes the monthly operating 



204 BUSINESS STATISTICS 

statement of the Home Owners Loan Corporation and the financial state- 
ment of the Home Loan Banks. Important special articles on housing 
appear in each issue. 

38. Annual Report of Home Loan Bank Board (annually) 

Contains data for recent years similar to those in monthly issues. 

Federal Power Commission 

39. National Electric Rate Book and State Rate Books (periodic intervals) 

40. Monthly Bulletin (monthly, and an annual summary) 

Production of electric energy in the United States, sources of energy by 
states, average daily production by public utility plants. 

Federal Communications Commission 

41. Operating Data from Monthly Reports of (a) Telephone; (b) Telegraph 
Carriers (monthly) 

a) A report of detailed operating revenues, operating expenses, income 
items and changes in capital items, by regions for telephone carriers 
giving the current month and cumulative totals for the year to date. 

b) A report of detailed revenue, expenses, and income of individual tele- 
graph carriers giving the current month and cumulative totals for the 
year to date. 

Interstate Commerce Commission Bureau of Statistics 

42. Statistics of Railways in the United States (annually) 

Summary statements concerning equipment, employees, revenues, ex- 
penses, and other data for all steam railways in the United States usually 
classified by districts. There are also separate reports from each company. 

43. Annual Report of the Interstate Commerce Commission (annually) 

Contains a statistical appendix giving data on railway development for 
a preceding period of years. Also contains miscellaneous summaries of data 
on operating revenue, expense and income, operating ratios, employment, 
and car loadings. 

44. Freight Commodity Statistics Class I Steam Railroads in the United States 
(annually) 

Includes annual data on car loadings by districts and groups of com- 
modities for the preceding ten years, also quarterly data by districts and 
commodity groups for the latest available year. The data are also broken 
down into individual commodities carried by individual railroads. 



CLASSIFICATION OF LIBRARY SOURCES 205 

A quarterly supplement to this report, having the same title and giving 
the car loadings for the most recent quarter classified by districts and by 
individual commodities is also published. 

45. Wage Statistics of Class I Steam Railways in the United States (monthly) 

A complete statement by occupations of the number employed, time 
worked, and wages received, with summaries. 

46. Statistics of Class I Motor Carriers (annually) 

Data regarding motor transportation of property and passengers. 



NON-GOVERNMENT SOURCES 
Financial, Trade, and Industrial Associations 

47. Reports of National Industrial Conference Board by National Industrial 
Conference Board, Inc., New York 

a) The Economic Record (semi-monthly) 

Data on wages, earnings, hours, and employment by individual indus- 
tries; also cost of living. All data except retail food prices are collected by 
the Conference Board, and are independent of similar series published by 
the Bureau of Labor Statistics. 

b) The Management Record (monthly) 

Data similar to those in Economic Record with articles of interest to 
employers. 

c) Special studies (irregularly) as supplements to The Economic Record. 

48. Annual Statistical Report of American Iron and Steel Institute (annually) 
New York 

Data concerning all iron and steel products, classified by types and by 
states for a period of years. The report includes also data on prices, for- 
eign trade, production in other countries, and some information on allied 
industries, as coal and coke. 

49. Electrical Research Statistics (monthly) by the Edison Electric Institute, 
New York 

A single sheet giving classified data of production, consumption, and 
sales of electric power. A similar sheet is also issued weekly, and a more 
comprehensive annual bulletin. This series supersedes similar bulletins 
published until 1937 by the National Electric Light Association, New York. 

50. Automobile Facts and Figures (annually) by Automobile Manufacturers 
Association, New York 

Devoted exclusively to data related to the automobile industry; includes 
production, sales, registration, taxation, financing, exports, truck trans- 
portation, used car sales, and allied data. 



206 BUSINESS STATISTICS 

51. Statistical Bulletin (annually) by American Petroleum Institute, New York 

A complete collection of data relative to the petroleum industry includ- 
ing production, consumption, imports and exports, and stocks on hand for 
the various products. The data are given monthly for the last two years 
and annually back to 1918. 

Monthly supplements of the Statistical Bulletin give current figures 
comparable with those in the annual issue. Additional current data dealing 
mainly with crude oil production by producing areas are supplied weekly. 

52. Exchange (monthly) by New York Stock Exchange, New York 

Supersedes the New York Stock Exchange Bulletin, giving summary 
data of the activities of the New York Stock Exchange, number and volume 
of sales, etc. 

53. Monthly Survey of Life Insurance Sales in the United States (monthly) 
by Life Insurance Sales Research Bureau, Hartford, Conn. 

A report of new ordinary insurance written for the current month and 
for the year to date, subdivided by states and regions. In 1937 a special 
report was published giving revised sales figures monthly from 1930 to 
1936. 

Financial Magazines and Papers 

54. The Commercial and Financial Chronicle (weekly) by Wm. B. Dana Co., 
New York 

Gives stock and bond quotations on the various exchanges for the pre- 
ceding week, banking and financial data currently reported, corporation 
balance sheets and statements, general industrial, trade and commodity data, 
news, and comments. Difficult to use because of variable content from 
week to week, but a valuable source for a wide variety of data. 

55. Business Week (weekly) by McGraw-Hill Publishing Co., Inc., New York 

A page entitled "Figures of the Week" contains data on production, 
trade, prices, finance, and banking. 

56. Barron's (weekly) by Barren's Publishing Co., Inc., New York 

Material very similar to the Commercial and Financial Chronicle, but 
presented in somewhat more popular style. Features several original indexes, 
barometers, etc. 

57. Dun's Review (monthly) by Dun & Bradstreet, Inc., New York 

General analysis of business conditions including regional indexes. The 
original source for data on business failures and indexes of wholesale com- 
modity prices. 

Each month Dun's Statistical Review is published as a supplement to 
the regular magazine. This supplement contains more detailed data on the 
subjects included in the magazine. 



CLASSIFICATION OF LIBRARY SOURCES 207 

58. Wall Street Journal (daily, except Sundays and holidays) by Dow-Jones & 
Co., Inc., New York 

Current events of economic and financial interest in United States and 
world. The previous day's quotations on stocks and bonds on exchanges 
throughout the country and in foreign countries, as well as commodity in- 
formation are given. Indexes of stock, bond, and commodity prices, divi- 
dend payments, and industrial data are included. 

Statistical Services 

59. Standard Trade and Securities, Statistical Bulletin (annually and monthly) 
by Standard Statistics Co., Inc., New York 

A complete record of monthly data concerning business and financial 
operations running back as far as the data are available. This is one of the 
most valuable reference books in print. The series are kept current in 
monthly supplements which may be bound with the most recent annual 
volume. 

60. Moody s Manuals of Investment (annually) by Moody's Investors Service, 
New York 

Contains financial statements of several thousand corporations both 
domestic and foreign, including a brief history, balance sheet and income 
statement of each corporation and a record of securities in the hands of 
the public. There are five volumes published each year industrials, rail- 
roads, public utilities, banks and finance, government and municipals. 

Trade and Industrial Magazines 

(The titles of these magazines give sufficient indication of the type of data 
contained, consequently the descriptions have been omitted) 

61. Iron Age (weekly) by Chilton Co., Philadelphia, Pa. 

62. Steel (weekly) by Penton Publishing Co., Cleveland, Ohio 

63. Metal Statistics (annually) by American Metal Market Co., New York 

64. India Rubber World (monthly) by Bill Brothers Publishing Co., New York 

65. Textile World (monthly) by McGraw-Hill Publishing Co., Inc., New York 

66. Northwestern Miller (weekly) by The Miller Publishing Co., Minneapolis, 
Minn. 

67. Automotive Industries (weekly) by Chilton Co., Philadelphia, Pa. 

68. Railway Age (weekly) by Simmons-Boardman Publishing Co., New York 

69. Marine Engineering and Shipping Review (monthly) by Simmons-Board- 
man Publishing Co., New York 



208 BUSINESS STATISTICS 

70. Engineering and Mining Journal (monthly) by McGraw-Hill Publishing 
Co., Inc., New York 

71. Printers' Ink (weekly) by Printers' Ink Publishing Co., New York 

72. Chain Store Age (monthly) by Chain Store Publishing Co., New York 

PROBLEMS 

1. A young stockbroker interested in general business conditions is planning 
a small library of statistical source material. The following list has been 
selected as adequate: World Almanac, current year; subscription to 
Monthly Summary of Foreign Commerce; subscription to Commercial 
and Financial Chronicle; subscription to Business Week; Statistical Abstract 
of the United States, most recent volume; Vol. I of Population, Sixteenth 
Census. 

a) Which of the foregoing would you retain? 

b) Name four others that should be included. 

c) Give reasons for your choice in (a) and (b). 

2. Name the publications that correspond to the following descriptions: 

a) Published monthly, by a government agency, containing some textual 
material and about 50 pages of tables that are practically identical 
in form from month to month, chiefly on the subject of finance. 

b) A 4-page leaflet published weekly by a government agency, containing 
certain indexes and other current weekly data in every issue; and also 
a number of series of monthly data, some of which appear in one 
issue and some in another, during each month. 

c) A group of large volumes published annually by a private company, 
each volume of which contains complete information concerning 
individual corporations of a certain type. 

d) A monthly government publication dealing solely with exports and 
imports, the December issue of each year constituting a summary of 
that year's data. 

e) A volume published by a private statistical concern, containing long 
series of monthly data and index numbers on every phase of business, 
the series being kept up-to-date by the addition of current supplements. 

/) A series of volumes, published at intervals of an irregular number 
of years and under slightly different titles, by a government agency, 
each issue containing the most complete data available in the United 
States on trade and various aspects of business other than industrial 
production. 

g) A weekly periodical, non-government, each issue of which contains 
current data on steel prices, with much more complete data on produc- 
tion, shipments, etc., in a large special number issued during January 
of each year. 



CLASSIFICATION OF LIBRARY SOURCES 209 

3. From the following list, describe each publication according to the five 
methods of classification named in chapter IX (instructor will assign 
one or more to each student according to the time available) : (a) Monthly 
Labor Review; (b) Minerals Year Book; (c) Survey of Current Business; 
(d) Abstract of the Census; (e) Census of Business; (/) Statistical 
Abstract; (g) Monthly Summary of Foreign Commerce; (b) Moody*s 
Manuals of Investments; (/) Commercial and Financial Chronicle. 

4. (Class exercise.) Name a source in which you think each of the follow- 
ing sets of data would be available. Explain your choice in each case. 
a) The number of tons of pig iron produced in the United States monthly 

from 1929 to 1936 inclusive. 

A) The number of employees on the payrolls of manufacturing concerns 
in the United States in 1934, 1935, and 1936. 

c ) The number of gallons of gasoline consumed monthly in the United 
States during the first six months of last year. 

d) The number of automobiles produced in the United States in 1935. 

e) The amount of sales by chain grocery stores in the state of New York 
in 1939. 

/) The value of agricultural products exported by the United States 

during the most recent month. 
g) The number of freight car loadings of grain and grain products 

shipped in the United States during the year before last. 
h) The index of department-store stocks for the United States, for the 

most recent month. 

5. Give exact reference to a publication (not mentioned in the text) con- 
taining numerical data not in tabular form. 

6. List publications (not mentioned in this section of the text) illustrating 
each subheading under the classification "Frequency of Publication." 

7. Give exact reference to a special statistical study (not mentioned in the 
text). 



CHAPTER X 
THE USE OF LIBRARY SOURCES 

INTRODUCTION 

ALJPERFICIAL consideration of the matter might easily lead 
one to expect that the entire task of collecting data from 
library sources consists in copying a quickly discovered list of 
figures from a book readily supplied by a library attendant. This is 
not what usually happens. Only in highly specialized libraries will an 
attendant be found who is trained in the intricacies of source material. 
In most cases the library staff will not be able to render any greater 
service than that of obtaining books and magazines from the stacks 
on request. 

Efficiency in collecting data from libraries comes only with long 
practice. It is a case primarily of learning to know what data to expect 
in different sources. While the beginner has no choice but to use 
what might be called the "shotgun" method, that is, search until the 
desired data happen to be found, a seasoned investigator uses a process 
of elimination based on his previous experience to narrow his search 
to two or three likely sources. By contrast this might be called the 
"rifle" method. If his selection has been accurate very little time will 
be required to find the data, or to obtain a guide as to where they may 
be found, or to discover that they are not available. In passing from 
the "shotgun" to the "rifle" method, there are two major things to be 
considered: (l) how to find a good source and (2) how to use it 
after it has been found. 

FINDING A GOOD SOURCE 

The purpose of this section is to set up a sequence of steps which 
can be generally employed in searching for a desired set of data. The 
process is one of successive elimination, but some guidance in the order 
of procedure will facilitate the work. There are usually two stages 
in the search, finding data on the general subject and finding a par- 
ticular set of data. There is no way of knowing in advance at what 
point the search should be concentrated on specific information. That 
must be determined in individual cases according to the circumstances. 

210 



THE USE OF LIBRARY SOURCES 211 

Steps 

The following steps are suggested in making a search of library 
sources. 

Step 1 . There are several standard reference sources which should 
be consulted for information on the desired subject. These are: 
Statistical Abstract of the United States, 1 Agricultural Statistics? 
Survey of Current Business? Monthly Labor Review* Federal Reserve 
Bulletin? Standard Trade and Securities Statistical Bulletin* Look 
in the indexes of these publications for the subject of the search. If 
the particular data can be obtained from one or several of them the 
search is ended. 

Step 2. If the desired data cannot be found in these sources, study 
the titles, headnotes, footnotes, and references of tables on the general 
subject to discover the original sources which may contain more detail. 
Study these detailed sources in turn for references to collateral sources. 

Step 3. If steps 1 and 2 have not led directly to the publication 
containing the precise information required, it is time to consult a 
bibliography of source material. The current edition of Market Re- 
search Sources 1 provides the most useful guide for any subject related 
to domestic marketing. It contains a full "finding guide" of subjects 
followed by a list of government and non-government publications 
classified according to publishing agency. Books and yearbooks are 
included as well as periodicals. 

If current data are desired, it is quite likely that their origin can 
be traced through the use of another publication of the United States 
Department of Commerce entitled Sources of Current Trade Statistics? 
This book is arranged in ready reference form so that the source of 
a particular series of data can be found through a finding index in the 
first part of the book and a list of sources in the second part. Neither 
the finding index nor the list of references includes any annual publi- 
cations or statistical compendiums. For example, the Statistical 
Abstract and the Standard Trade and Securities Statistical Bulletin are 
not mentioned. This guide does, however, include some references on 

1 Appendix A, No. 8. 

2 Appendix A, No. 23. 
8 Appendix A, No. 1. 

4 Appendix A, No. 16. 
8 Appendix A, No. 34. 

6 Appendix A, No. 59. 

7 Appendix A, No. 7. 

8 Latest edition to date, June, 1937. 



212 BUSINESS STATISTICS 

foreign trade which is one of the few subjects not covered by Market 
Research Sources. Data expressed in index number form can often be 
located by referring to An Index to Business Indices. 9 This book 
contains a finding index that is convenient to use in locating the 
detailed descriptions of indexes appearing in the second part of the 
book. These descriptions include the names of the source or sources 
in which the desired index can be obtained. 

Step 4. At this point the card catalogue of the library should be 
consulted if the data have not been found. The cards are classified by 
author, title, and subject. Look up the subject concerning which you 
want to get the data. You will probably find references to non-govern- 
ment publications. Select those which are likely to contain data and 
investigate them. If the data are still elusive the next reference should 
be to the government list of publications. These are sometimes not in- 
cluded in the main subject catalogue of the library but are listed sepa- 
rately under "United States." The classification is by departments, 
bureaus, commissions, and offices. The publications most likely to 
yield results are listed in Appendix A. 

Step 5. Each month the Government Printing Office issues the 
Monthly Catalogue of United States Public Documents which includes 
all publications during that month. Several monthly catalogues should 
be examined to discover any recent publications on the subject of 
the search. This "check list" is classified by departments, bureaus, etc. 

Step 6. If access to the stacks of the library is possible, the search 
should be continued there. Go to the section in which you have already 
found books dealing with the subject and there perhaps other publi- 
cations will be found which contain the desired data. 

Step 7. Look through trade, financial, and technical magazines. 
The ones most likely to be productive will be determined by the nature 
of the subject. Some of these are listed in Appendix A. 

Step 8. If the data are still elusive or perhaps incomplete go 
through the periodical indexes which are found in the library. The 
following are most frequently available: Readers Guide to Periodical 
Literature, Industrial Arts Index, 11 Public Affairs Information Serv- 
ice, New York Times Index 

Donald H. Davenport and Frances V. Scott, An Index to Business Indices, Chicago: 
Richard D. Irwin, Inc., 1937. 

10 H. W. Wilson Co., New York. 

11 H. W. Wilson Co., New York. 

12 Public Affairs Information Service, New York. 
New York Times Co.. New York. 



THE USE OF LIBRARY SOURCES 213 

Step 9- If at this point the desired data have not been found, it 
is time to consult some experienced person who may have knowledge 
of them. The experienced person for students means the teacher; for 
research workers, a fellow-worker or director. Finally, it may be 
desirable to write to a government or non-government agency for the 
information. The United States Information Service, 1405 G Street 
N.W., Washington, D. C, has been established as a Division of the 
Office of Government Reports to answer inquiries regarding the 
departments and agencies of the federal government. 

Only in the most difficult cases will it be necessary to employ all of 
these steps. Usually the first two or three will be productive. After 
a few searches have been made the general contents of the major pub- 
lications will be sufficiently familiar so that in most cases the proper 
source can be selected immediately. The further one progresses in the 
use of library sources the less the need for formal methods and the 
greater the reliance on experience. 

Examples of Library Search 

Some problems for library search were assigned to a student who 
had a slight acquaintance with the titles of the various statistical pub- 
lications but very little knowledge of their contents. His report of 
the results of the search is reproduced as Appendix B at the end of 
this chapter. The report portrays the student's reaction to success or 
failure during the search with a sincerity which could not have been 
simulated by the authors if they had attempted to write this appendix. 

The examples were arranged so that successive ones would require 
the use of additional steps of the search process. Careful study of the 
student's explanations will show that in doing the first few examples 
he acquired considerable knowledge of the contents of standard sources 
which saved time in the later examples. This experience and the simi- 
lar experiences of many other students lead to the conclusion that the 
only way to acquire familiarity with the contents of published sources 
is by handling them and searching through them for some definite 
piece of information. 

THE CORRECT USE OF DATA 

The search procedure of the preceding section leads to the location 
of a given set of data in a single source or in two or three alternative 



214 BUSINESS STATISTICS 

sources. Before the data can be transcribed they must be put through 
a process of verification and tested for validity. 

Verification of Data 

The data should be verified (1) to detect discrepancies, (2) by 
cross-reference when multiple sources are available. 

Discrepancies. Discrepancies in data are usually not difficult to 
detect but may escape the unwary collector. They may appear as a 
result of one or more of the following causes. 

Changes in unit: Some of the things that may be expected are 
changes in the unit of measure, changes in the definition of the unit 
and changes in the nature of the unit. Illustrations of all of these 
changes can be found in the Statistical Abstract for 1936. An example 
of change in the unit of measure is shown in Table 524 which presents 
"Imports of Merchandise by Commodity Groups and Articles." On 
page 536 the first item is wood pulp. The unit used is long tons prior 
to 1935 and short tons beginning with 1935. A change in the definition 
of the unit appears in Table 247 entitled "Reporting Member Banks 
in 101 Leading Cities Principal Assets and Liabilities." "Demand 
Deposits Adjusted" is the heading of the next to the last column. 
Through August, 1934, the data are net demand deposits, but subse- 
quently are adjusted as explained in the footnote. The figures really 
represent two different things and cannot be regarded as a single series 
even though they are printed in the same column. Table 426, "Railway 
Equipment in Service, All Reporting Companies," shows that there was 
a larger number of steam locomotives in service in 1916 than in 1929 
despite the greater volume of traffic hauled during the later year. This 
is explained by the change in the nature of the thing counted, since a 
locomotive manufactured in 1916 was hardly the same as a locomotive 
manufactured in 1929. 

Changes in classification: Arrangements according to time, space, 
or attribute may be involved. A change of the time period for record- 
ing railroad data occurred in 1916 when a shift was made from fiscal 
to calendar years. An adjustment must be made for this change if the 
earlier and later periods are to be combined in a single series. Changes 
in the boundaries of the wards of cities have the effect of changing 
the classification of any data reported by wards. Changes in attribute 
classifications appear frequently in the biennial Census of Manufac* 



THE USE OF LIBRARY SOURCES 215 

lures, as illustrated by the following statement introductory to the 
section entitled "Radio Apparatus and Phonographs/ 1 

At censuses taken prior to 1931, the manufacture of phonographs was 
treated as a separate industry, but the increasing production of radio apparatus 
by manufacturers of phonographs and the introduction of the combination 
radio-phonograph unit made it desirable to establish the present classification. 
Manufacturers of radio apparatus were formerly classified in the "Electrical 
machinery, apparatus, and supplies" industry. The schedule for this industry 
did not call for detailed data on the production of radio apparatus, and there- 
fore no comparative statistics are given for years prior to 193 1. 14 

A discrepancy which is closely allied to a change in spatial classifi- 
cation occurs when the area for which data are reported is changed. 
Such changes may arise from shifts in national boundaries, in customs 
districts, or in navigable waters. On the other hand the changes may 
be of a purely statistical character, as the following examples will 
show. The Bureau of Labor Statistics report of building permits issued 
included 262 cities in 1921 and 1922. In subsequent years the number 
of cities has been gradually increased until in May, 1940, it reached 
2,047. The figures are clearly not comparable from 1921 to 1940. 
Comparable figures over a period of years for 257 identical cities 
are published in each issue of the Statistical Abstract. The birth and 
death registration area of the United States is another example of 
changing area. Starting with Massachusetts, New Jersey, and the 
District of Columbia in 1880, the registration area for deaths has been 
gradually expanded until in 1933 for the first time all of the states 
were included. The birth registration area started with ten states and 
the District of Columbia in 1915 and expanded gradually until all of 
the states were included in 1933. During this period the number of 
births and deaths cannot be compared from year to year, but birth 
rates and death rates are approximately comparable. 

Revisions: Perhaps the best example of this type of discrepancy 
is to be found in Agricultural Statistics (formerly included in the 
Yearbook of Agriculture}. There are scarcely two yearbooks which 
give the same series of figures for the country's wheat production. In 
the issue of 1935 corrections were made as far back as 1866. While 
there are many other cases of this kind in recorded data, it is unlikely 
that many can be found which are less stable than the records of crop 
estimates of the Department of Agriculture. Presumably the only 

14 Census of Manufactures, 1933, p. 577. 



216 BUSINESS STATISTICS 

thing which can be done with such figures is to use the most recent 
issue and hope that corrections made in subsequent issues will not 
destroy the validity of the data used. 

It cannot be safely assumed, however, that the most recent or re- 
vised figure is always correct. Errors in revisions occur less frequently 
than in preliminary figures, but are more likely to be overlooked. An 
example appeared in the Survey of Current Business during the early 
months of 1937. The particular series involved was 'Total Car Load- 
ings." Table 24 is a reproduction of the data with footnotes intended 
to explain the changes as printed in three successive issues. The foot- 

TABLE 24 

TOTAL CAR LOADINGS AS PRINTED IN THE MONTHLY 
SURVEY OF CURRENT BUSINESS WITH THE PERTINENT FOOTNOTES 

(000 omitted) 



As PRINTED IN 




1936 




THE ISSUE OF 


JANUARY 


FEBRUARY 


MARCH 


February, 1937* 


2,353 


3,135 


2,419 


March 1937f 


2,975$ 


3,135 


2,419 


April, 1937 




2,512$ 


2,419 











Data for February, 1936, are for 5 weeks, other months, 4 weeks. 
tData for January, 1936, are for 5 weeks, other months, 4 weeks. 
^Revised. 

notes do not explain ail that happened to this series. In the February 
issue of 1937, and in the ten preceding issues, car loadings for five 
weeks were included in the February, 1936, figure, giving 3,135,000 
cars. Beginning with the March, 1937, issue the loadings for a week 
which ended February 1, 1936, were shifted from the February total 
to the January total. Thus the January, 1936, total was increased to 
2,975,000 cars, but an equivalent deduction was not made from the 
February total. As a result 622,000 cars reported for the week ending 
February 1, 1936, were counted twice in the March, 1937, issue. The 
error in the February, 1936, total was corrected in the issue of April, 
1937, but unfortunately the March issue is most frequently used 
because it contains data for the full 12 months of 1936. 

Typographical errors: A good example is found in the record of 
bank clearings printed weekly in the Commercial and Financial 
Chronicle. Individual clearings are printed for more than 100 cities 
and in that list it is not uncommon to find as many as five changes in 
the data copied from the previous week. There is no way of knowing 



THE USE OF LIBRARY SOURCES 217 

which is the misprint since no explanations are included. Such errors 
are most likely to occur in publications which are not carefully 
proofread. 

Interruptions in series: Loss of continuity in a series which has 
been published regularly creates a problem for the user. If the inter- 
ruption is brief such as the gap in the recording of bank debits caused 
by the "bank holiday'' in March, 1933, simple interpolation may be all 
that is needed to resolve the difficulty. There are other cases of failure 
to publish which are less easy to overcome. For example, from July, 
1933, to February, 1935, inclusive the Post Office Department found 
it inconvenient to release for current publication the figures for postal 
receipts in "Fifty Selected Cities" and in "Fifty Industrial Cities." 
Such a prolonged suspension brings to a halt any statistical work 
involving use of the missing data. 

Even a slight experience will afford enough background to insure 
that many of the inconsistencies in published data will be recognized. 
Beyond that lies the task of detecting the less obvious discrepancies. 
Two things arc necessary for this, the first is varied experience in col- 
lection, the second is the exercise of common sense. The latter might 
be defined as a combination of experience, judgment, and figure 
perception. 

Cross-Reference. In many cases only one source can be found for 
a required set of data and no verification by cross-reference is possible. 
Frequently, however, similar data are collected by several agencies. 
In these cases all of the sources should be found as a means of de- 
termining which is most complete, which contains the data in most 
usable form, and which has the best general record of reliability. 

It is also desirable to get the most recently published source so that 
any corrections or revisions of the data will be discovered. If the record 
coincides in all of the sources, that fact gives added confidence in the 
accuracy of the data. If differences appear, the necessity of reconciling 
them arises. Discrepancies of the types enumerated in the preceding 
section may be involved or fundamental differences in the method of 
collection may be uncovered by study of the notes accompanying the 
tables. If inconsistencies arise which cannot be explained, it is neces- 
sary to search for collateral sources or perhaps to write to the collecting 
agency for further information. 

The process of comparing the data in several sources is known as 
cross-reference. An example of the use of cross-reference will serve 



218 



BUSINESS STATISTICS 



to demonstrate the method and its advantages. Suppose that the fol- 
lowing problem were proposed on June 1, 1937: "Collect data on 
annual production of steel ingots for the years 1932-1936, inclusive." 
The information obtainable from four sources is shown in Table 25, 
columns 1 to 4. The four reports contain different figures despite the 
fact that the original source of all four sets of data is the American 
Iron and Steel Institute. 

The title of the table from the Statistical Abstract states that steel 
ingots and steel for castings are included. Since the problem asks for 
steel ingots only, these data would not be satisfactory, even though 
the figure for 1936 could be supplied from current sources. An exam- 
ination of the March, 1937, Survey of Current Business in which the 
tonnage for steel ingot production and castings is given separately in- 

TABLE 25 

PRODUCTION OF STEEL INGOTS IN THE UNITED STATES, ANNUALLY 1932-36, 

AS REPORTED IN FOUR SOURCES 

(thousands of long tons) 



YBAR 


STATIS- 
TICAL 
ABSTRACT* 

(1) 


ANNALIST 
(2) 


STEEL!! 
(3) 


STEEL 
YEARBOOK 

OF INDUS- 
TRY** 

(4) 


REVISFD SERIES 
HESSEMER AND 
OPEN HEARTH 
PRODUCTION 
(5) 


1932 


13,681 


13,323t 


13,464 


13,323 


13,323 


1933 


23 232 


22,594f 


22,894 


22 594 


22,594 


1934 


26,055 


25,5991: 


25,949 


25,599 


25,599 


1935 


34,093 


33.426S 


33,940 


33,418 


33,418 


1936 




46,9 19H 


47,513 




46,808 















M936 issue, p 705. 

t December 7, 1934. p. 790. 

^February 14, 1936\ p. 270. 

Apnl 10, 1936, p. 549. 

I February 12, 1937, p. 277. 

IfMay 10, 1937, p. 32, second table, "Annual Steel Ingot Production." 

"January, 1937. p. 360, "Steel Ingot Production, 1917-1937." 

dicates that none of the other three series, Table 25, columns 2 to 4, 
includes castings. Further study is needed, however, to reconcile the 
differences in these three series. The Annalist data correspond to the 
Steel Yearbook through 1934, but differ in 1935. If the Annalist for 
October 9, 1936, instead of April 10 is used, revised monthly data 
are found for all of 1935 which agree with those in the Yearbook, 
leading to the conclusion that the 1936 Annalist figures will likewise 
be revised later in 1937. It can now be concluded that these two series 
coincide, with only the revised 1936 figure lacking. Headings and foot- 
notes to the respective tables indicate that both include only Bessemer 
and open-hearth processes. 



THE USE OF LIBRARY SOURCES 219 

The data from Steel, column 3, are classified in the original source 
according to processes including crucible and electric as well as Bes- 
semer and open-hearth. The difference between this series and the 
other two is explained by the inclusion of production by the crucible 
and electric processes. If a subtotal for Bessemer and open-hearth 
processes only is computed from the original table, the results coincide 
through 1935 with those from the Annalist and the Steel Yearbook. 
Since the May issue of Steel was published later than either the Year- 
book or the Annalist, one can assume its 1936 figure is the more recent 
revision. 

The series for steel ingot production by open-hearth and Bessemer 
processes can therefore be completed as shown in column 5, and there 
is now no disagreement among the three sources. There are, however, 
two complete series to choose from column 3 which includes crucible 
and electric production and column 5 which does not. Since the reports 
issued by the American Iron and Steel Institute usually include open- 
hearth and Bessemer only, column 5 appears to be the most desirable 
series to use. 

There are two major advantages in conducting this search: (l) the 
determination of the best figures to use for steel ingot production 
and (2) the collateral knowledge acquired concerning methods of 
recording data on steel ingot production. 

Evaluation of Data 

Evaluation deals not so much with the accuracy of data as with 
their validity. The question is: Are these data satisfactory for the 
purpose for which they are to be used? The answer can be obtained 
by understanding the background of the collection and by visualizing 
the collection process. 

Understanding the Background. Data come to exist either as a 
by-product of non-statistical activity or directly for statistical purposes. 
There are many examples of series of data which are collected for 
statistical purposes. The work of the Bureau of Census, the Bureau 
of Labor Statistics, and the Bureau of Agricultural Economics is carried 
on for the purpose of providing numerical information for general use. 
The purpose is directly statistical. 

Illustrations of data secured as a by-product of other activity are 
gasoline consumption by motor vehicles and cigarette consumption, 
both obtained by the Bureau of Internal Revenue in the course of 



220 BUSINESS STATISTICS 

collecting the taxes levied on these articles by the government. Further 
examples are a census of employment, which might be tabulated from 
the registration cards for retirement annuities filed with the Social 
Security Board by employed workers at the end of 1936, and an index 
of grocery prices which might be computed from the newspaper adver- 
tising of grocery stores. 

By-product data are collected for some official or business purpose. 
Once they have served that purpose the collectors have no further 
interest or at most only a collateral interest in them. They may be 
kept in poor form; errors corrected for the major purpose may be 
omitted from the statistical record; there may be overlaps and omis- 
sions which creep in because the statistical record has not been ade- 
quately checked; the data may not be in usable form for statistical 
purposes, although serving the major purpose well. Since the data 
from by-product sources are likely to contain inaccuracies, it is desirable 
wherever possible to cross-check them in a direct statistical source. 

Visualizing the Collection. This means asking the question: How 
were the data collected? By answering this question considerable in- 
sight will be gained concerning the difficulties that were encountered 
in collecting the data and consequently a fair basis may be obtained 
for judging their reliability. An example will show what is involved 
in visualizing the collection. 

The United States Department of Agriculture publishes estimates of 
wheat production annually. To collect complete information concerning 
the amount of wheat produced would involve canvassing each year more 
than half of nearly 7,000,000 farmers in the United States. This would 
be a long and costly task and even if it were possible to do the work 
the results would contain some error because many farmers have no 
accurate record of the size of their wheat crop. Hence the Department 
of Agriculture makes no attempt to collect complete data annually. 
There are crop reporters in all parts of the country who voluntarily 
send in estimates of the number of acres planted in wheat in the 
sections their reports cover. Only a small part of the wheat acreage 
in the country is thus reported, but by applying the estimates to unre- 
ported areas statisticians are able to calculate the acreage planted in 
wheat for the entire country. Then at harvest time the same crop 
reporters send in estimates of the average yield per acre in their ter- 
ritories. By multiplying the estimated acreage by the estimated yield 
per acre the approximate production can be obtained for each section 



THE USE OF LIBRARY SOURCES 



221 



of the country. The total of these sectional estimates is the only annual 
production figure available for the whole United States. Every five 
years (ten years, prior to 1920) an actual census of production is taken 
and the estimates are checked against the census. Table 26 shows that 
the estimates varied from the census by more than 3 per cent on only 
two occasions and in four cases have varied by less than 1 per cent. 
Hence the conclusion is that the Department of Agriculture annual 
estimates of production are fairly accurate, but the margin of error 
inherent in the method of collection must be kept in mind when they 
are used. 

TABLE 26 

COMPARISON OF DEPARTMENT OF AGRICULTURE ESTIMATES OF 
WHEAT PRODUCTION WITH BUREAU OF CENSUS COLLECTION* 



YEAR 


(1) 
DEPARTMENT OF 
AGRICULTURE 
ESTIMATE 
(IN BUSHELS) 


(2) 
BUREAU 
OF CENSUS 
COLLECTION 
(IN BUSHELS) 


(3) 

PER CENT VARIATION 
(1) -=- (2) - 100% 


1879 


459,234,000 


459,483,000 


.05 


1889 


504,370,000 


468,374,000 


-f-7.69 


1899 


655 143 000 


658,534,000 


.51 


1 909 


683 927,000 


683,379,000 


4- .08 


1919 


952,097,000 


945,403 000 


4- -71 


1924 


841,617,000 


800,877,000 


4-5.09 


1929 


823 217,000 


800,649,000 


4-2.82 


1934 


5?6 393 000 


513,213,000 


4-2.57 











* Agricultural Statistics (1937), pp. 9-10. 

Example of Evaluation. Table 27 illustrates many of the pitfalls 
in the use of data and shows the method of evaluating data from the 
notes which accompany the table. 

One quickly detects from reading the several notes that the informa- 
tion contained in this table has variable accuracy. For some states the 
sales are determined by the number of tags addressed to consumers 
in that state by fertilizer manufacturers. If the counts are kept ac- 
curately, if the bags are all the same size, and if car-load shipments 
sent to retailers near state boundaries are distributed mainly in the 
state in which the retailer resides, then the tag count may give fairly 
good results. For other states estimates are made either by state 
authorities or by the National Fertilizer Association. Actual records 
of sales are compiled by state authorities for another group of states. 
For the year 1929 data collected by the Census of Agriculture in 1930 
are used as the most reliable estimates of sales in some states but not 
in others. 



222 



BUSINESS STATISTICS 



TABLE 27 
FERTILIZER: ESTIMATED SALES IN THE UNITED STATES 

NOTE. Data are based on fertilizer tag sales for some States and are compiled by 
State authorities from sales records, etc., for others, as indicated by footnotes. For 1929, 
census data have been used in many cases. Other figures are estimates made by State 
authorities or the office of the National Fertilizer Association. 

(In tons of 2,000 pounds) 



DIVISION AND STATE 


1928 


1929 


1935 (prel.) 


United States 


7,985 019 


8 078 548 


6 191 321 










New England 


365 119 


357465 


282 503 


Maine 


178,750 


185,650 


125,000 


New Hampshire* 


16,900 


ll,500f 


16,000 


Vermont^ 


16,911 


14,905 


15,295 


Massachusetts *$ 


70,458 


68,61 If 


63,208 


Rhode Island 


10 100 


7,909f 


12 000 


Connecticut 


72 000 


68,890f 


51,000 


Middle Atlantic 


743,558 


798,433 


658,874 


New YorkJ 


260 000 


287,959f 


234 000 


New Jerseyjj 


143 574 


162 36lf 


149 408 


Pennsylvania^ 


339,984 


348,113f 


275,466 


East North Central 


755,711 


820,402 


658,696 


Ohio* 


320,866 


338,662 


306,509 


Indiana^ 


221,082 


250,201 


194,946 


Illinois!^ 


30 509 


38,056 


23,827 


Michigan 


150,213 


152,812 


105,000 


Wisconsin^ 


33,041 


40,671 


28,414 


etc. 









Year ended June 30, except data for 1929. 
t Agricultural census. 

t Compiled by state authorities, except as noted. 
8 Year ended March 31, except data for 1929. 

JYear ended October 31. 
Based on tag sales. 

Source: The National Fertilizer Association, Statistical Abstract (1936), p. 598. 

Certain other peculiarities should also be noted. In New Hampshire, 
Massachusetts, Rhode Island, and New Jersey there is a discontinuity 
between 1928 and 1929 data. For example, for New Hampshire the 
1928 data cover the period from July 1, 1927, to June 30, 1928, 
whereas the 1929 data cover the calendar year, January 1 to Decem- 
ber 31. Hence the table contains no record of sales in these states dur- 
ing the second half of 1928. A further difficulty appears in footnote ||. 
Presumably it should read ''Year ended October 31, except data for 
1929," since footnote f on the 1929 data for New Jersey shows that 
they are census data and we know that the Census of Agriculture cov- 
ered the calendar year. Finally, revised 1935 figures are to be found 
in the 1937 Statistical Abstract. 

The detailed analysis of the notes accompanying this table indicates 
the method of evaluating data in terms of the background and sur- 



THE USE OF LIBRARY SOURCES 223 

rounding circumstances. Footnotes and headnotes should always be 
studied carefully to discover what explanations the author of the table 
believed necessary to its comprehension. To disregard such notes is 
direct failure to use the available means of verification and evaluation 
of the data in the table. 

Transcribing the Data 

The final step in the collection process is to transfer the data from 
the source to a collection form. Although this appears to be a purely 
routine matter there are certain rules which, if observed, will help 
to avoid trouble later. Always assemble all of the data befoa doing 
any copying. Too frequently it has been the authors' experience that 
students bring in part of a series of data and ask for advice on how 
to complete the series only to find that it cannot be completed and a 
new series must be found. Until all of the data have been found there 
is no way of knowing whether partial discoveries will be usable. 

In transcribing data always start with the publication of most 
recent date and work back to the earlier dates. When revisions have 
been made from time to time, the best way to discover them is by com- 
paring data in the latest publication with overlapping data of an earlier 
publication. For example, if it is necessary to obtain data from 1929 
to 1940, inclusive, and data from 1933 to 1940 are found in one issue 
of a publication, then the latest issue containing data from 1929 to 
1933 should be used to complete the series. Data for 1933 appear in 
both issues and should be compared to insure that no change has 
occurred in the recording and that the same series is being taken from 
both issues. 

In another case the data may not agree in the two issues. Then 
three possibilities arise: (1) Explanations accompanying the tables 
may state the nature of the revision involved and how to make the 
series comparable in the two issues. (2) No explanation of the change 
may be given and it will be necessary to find another source containing 
the same series in comparable form or a substitute series that will serve 
the purpose. (3) Failing in both of the preceding expedients, the search 
may have to be abandoned. Difficulties in matching series in different 
issues of a source book occur most frequently as the result of shifting 
the base of an index. Such a revision can usually be adjusted unless an 
accompanying change in the method of constructing the index entirely 
destroys the comparability of the two parts of the series. 



224 bUMNtsbb MA11&11C5 



APPENDIX B 
EXAMPLES OF SEARCH FOR DATA IN LIBRARIES 

These examples are intended to show how a student 15 proceeded to find six 
series of data for problems which were assigned to him. 

Example I: Find the monthly freight car loadings by commodity classes for 
the ten-year period 1927-36. 

1. Thought that material would be in the Statistical Abstract of the United States, 
but found that although theie were data for freight car loadings, the data were not in 
monthly form. Made a mental note of this remembering that most series of figures 
included in the Abstract were indicated by years. 

2. Looked in the index to the Survey of Current Business and found the necessary 
material by individual classes and also by months for a particular period. By searching 
through the back issues or through the Supplements, I found the material available for 
the full ten years. 

3. Tried the Federal Reserve Bulletin also and found that the same material was 
included in their monthly issues under the topic of industrial activity. 

Example II: Find the monthly indexes of employment and payrolls in the 
United States for the five years from 1932 to 1936, under this particular 
heading "Retail Trade General Merchandising/' 

1. Discarded the thought of using the Statistical Abstract, because I wanted monthly 
figures. 

2. Reached for the Survey of Current Business and found under the heading of 
employment and the heading of payrolls that monthly data were available for "Retail 
Trade," but that no distinction was made for "Retail Trade General Merchandising." 

3. Because of the nature of the topic, I tried the Monthly Labor Review and by 
chance picked up a monthly issue dated in 1933. Here again the data were available but 
no distinction was made between General Merchandising and other merchandising. 
Searched further and found that the January, 1937, issue made this distinction in their 
current data. In looking through issues for 1936 and 1935, I found the entire series 
dating back to 1932 in the January, 1935, issue. 

Example III: Find the Freight Tonnage Originating on Class I Steam Rail- 
ways in the United States by quarters from 1927 to 1936. Designate the fol- 
lowing commodity groups separately: Products of Agriculture, Animals and 
Products, Products of Mines, Products of Forests, Manufacturers and Miscel- 
laneous, All L.C.L. Freight. 

1. Looked in the Statistical Abstract of the United States and found data on freight 
tonnage, but the data were not in commodity groups nor by quarters. The figures showed 
tons of revenue freight carried. 

2. Tried the Yearbook of Agriculture and found data entitled Freight Tonnage 
Originating on Railways in the United States and also the correct commodity groups, 
but the data were annual figures. 

3. The Minerals Yearbook did not have the data in the correct form. And I didn't 
try the Survey of Current Business nor the Monthly Labor Review, nor the Federal 
Reserve Bulletin because of their particular use of monthly figures. Of course, this is not 
always true, but the particular nature of the sources led me to believe that the data were 
not available in them. 

4. Because most of the previous data had been compiled by the Interstate Com* 
merce Commission, I looked in the card catalogue for particular oulletins or statements 

18 Reports written in 1937 by Robert Berner, then a sophomore in the School of 
Business Administration of the University of Buffalo. 



THE USE OF LIBRARY SOURCES 223 

published by the Commission or other independent establishments under the heading of 
"Commercial and Industrial," or publications of the Department of Commerce. A good 
index of government publications is the Monthly Catalogue of Public Documents. In 
the 1935 issue, I found under the topic of freight commodity statistics that the Bureau 
of Statistics of the Interstate Commerce Commission sent out statements quarterly giving 
Freight Statistics on Class I Steam Railways in the United States which included the total 
freight tonnage by the commodities indicated. 

Example IV: Find the total market value of all listed stocks on the New York 
Stock Exchange by months from 1926 to 1936. 

1. Tried the Federal Reserve Bulletin and found certain data about security markets 
giving the indexes of stock prices by months, but the data required were not found 
in this source. 

2. Did not attempt looking into the Monthly Labor Review, Statistical Abstract, 
Minerals Yearbook or Agricultural Yearbook because of the nature of the subject and 
problem. 

3. Found in the Survey of Current Business the stock prices of all stocks, sales, 
yields, and other information but not the total market values of all listed stocks. At 
this time, I thought that if I had the base figure and could compute the actual figures 
for sales and stock prices, that by multiplying the two figures I might have usable 
figures of market values. This procedure would not be too accurate, however. 

4. Looked in the card catalogue for government publications other than the original 
six, but found nothing dealing with the subject except material which had been used 
in the Federal Reserve Bulletin and the Survey of Current Business. 

5. Found in the card catalogue that the New York Stock Exchange published a 
monthly bulletin. Upon getting a copy I found the acceptable material. 

Example V: Find the yearly production of steel rails from 1919 to 1936 by 
the following processes of steel manufacture: open-hearth, Bessemer, crucible, 
electric, and all others. Compare and explain the variations in each series. 

1. Went immediately to the Statistical Abstract of the United States expecting to find 
the yearly figures. I did find figures giving the total rail production from 1914 to 1935, 
but they did not indicate the four distinct processes. 

2. The Survey of Current Business contains monthly data on track work production, 
but does not contain the production according to processes. 

3. Exhausted the content indexes of the Agricultural Yearbook, Minerals Yearbook, 
Federal Reserve Bulletin, and the Monthly Labor Review, but found nothing suitable 
to my needs. 

4. Tried the other government sources, looking in the card catalogues and Index to 
Government Publications but found no discrimination in the processes used in producing 
steel rails. They did contain, at least a few sources contained, the total rail production 
figures by years, their source being the American Iron and Steel Institute. 

5. Picked up t copy of the Annalist, finding only the tons of rails ordered by 
months as taken from the Railway Age magazine. This was not satisfying, nor were the 
data in the Commercial and Financial Chronicle. 

6. Attempted to find the material in the technical magazine Steel. I found the data 
contained month by month quite inconsistent except for their index of business activity. 
Each month they include a new set of data recurring irregularly. Looking through several 
copies, I decided that the material I wanted was not included. 

7. Tried the magazines Railway Age and Iron Age but found nothing satisfactory in 
either. The data in Iron Age are quite consistent, appearing in each monthly issue, 
showing the last two months, absolute figures. Their index of capital goods is one 
that is quite widely known and widely used. It shows weekly variations. Many of their 
figures on steel production and output were taken from the annual statistical report of 
the American Iron and Steel Institute. 

8. I tried this annual report, finding an abundance of data on steel production by 
orocesses. one set of which contained the data I wanted. 



226 BUSINESS STATISTICS 

Example VI: Find the dollar value of department-store sales, annually from 
1927 to 1936 for the United States. 

1. Glanced through the Agricultural Yearbook, Minerals Yearbook, and the Monthly 
Labor Review, and as I expected, found nothing pertaining to the required data. 

2. Realizing that the Survey of Current Business and Federal Reserve Bulletin usually 
contain monthly data, I nevertheless found the data on department-store sales by months 
in index form. By so doing, I hoped to find the original source of the data. The data, 
appearing in forms adjusted for seasonal and unadjusted, were compiled by the Board 
of Governors of the Federal Reserve System, Division of Research and Statistics. These 
figures represent monthly dollar sales for a sample of approximately 425 stores. 

3. Tried the Statistical Abstract of the United States, finding the dollar value of 
department store sales (including mail-order sales) for 1929 and 1933. From the various 
distinctions given to retail stores, I suspect there is a difficult problem in determining 
what kind of store may be classified as a department store. The Abstract also has an 
index of yearly department-store sales from 1919 to 1935 which is also a sample of 
from 400 to 560 stores compiled by the Federal Reserve Board. The actual dollar sales 
are available for 1929 and 1933 because of the census reports made by the Census 
Bureau. The original source of these figures comes from the Fifteenth Census of United 
States: Distribution. 

4. From the Census of Business, I found the dollar value of net sales for 1933 and 
1935. This source divided department-store sales into independents, chains, mail-order, 
commission or company stores, and all others. 

5. Found nothing in the Federal Document Index nor card catalogue that would lead 
me to the desired data. 

6. The Industrial Arts Index in the Buffalo Public Library showed several sources 
of statistical data pertaining to department-store sales, some of which are included in 
the foregoing steps. Mr. C. M. Schmalz of the Harvard University Graduate School of 
Business Administration published a report containing the "Operating Results of Depart- 
ment and Specialty Stores in 1935." Yet these data were not wide enough in scope, 
either in number of establishments or variety of years. 

7. The Commercial and Financial Chronicle contains monthly statistics of department- 
store sales in index form, taken again from the Federal Reserve Board. 

8. Looked through various technical sources and card catalogues indicating subjects 
of technical magazines and books, but found nothing about dollar value of department- 
store sales, annually from 1927 to 1936. 

9. Concluded that data were not available in published sources. 



PROBLEMS 

1. The answer to each of the following questions is to be found in a com- 
monly used government source, (a) Give answers to the questions (as 
assigned) with exact reference to the source, (b) Describe the steps you 
followed in each case in order to locate the data. 

(1) The percentage of increase in population for the United States and 
for California, from 1920 to 1930 and from 1930 to 1940. 

(2) The total number of strikes in progress in the United States during 
the month of August of last year; the number of workers involved; 
and the number of man-days idle during the month. 

(3) The wholesale price per bushel of No. 2 hard winter wheat at 
Kansas City, for the most recent week. 

(4) The number of dozen pairs of women's full-fashioned silk hose ex- 
ported from the United States during June of last year. 



THE USE OF LIBRARY SOURCES 227 

2 The answer to each of the following questions is to be found in a com- 
monly used non-government source, (a) Give answers to the questions 
(as assigned) with exact reference to the source. (b) Describe the steps 
you followed in each case in order to locate the data. 

(1) The number of new passenger car registrations for Ford, Chevrolet, 
Plymouth, and Cadillac in November of last year. 

(2) The number of business failures in retail trade in the United States 
during the month of May each year since 1939- 

(3) The percentage of American-made passenger cars sold outside the 
United States in 1935; motor trucks; total. 

(4) The gross federal debt as reported by the United States Treasury 
as of 3 days ago. 

3. State which of the steps of library search were employed in each of the 
examples in Appendix B. 

4. Write a report of discrepancies found in the bank clearings reports in 
certain cross-referencing issues of the Commercial and Financial Chronicle. 
For example, the following issues will serve the purpose: January 18. 
1941, February 22, 1941, etc., at monthly intervals. 

5. A cursory search and attempted verification of the production of pig 
iron in the United States in 1937 produces the following figures: 

Steel, Yearbook of Industry, January 1, 1940 36,709,000 gross tons 

Statistical Abstract, 1939 35.224,000 long tons 

Survey of Current Business, 1938 Annual Supplement 3,051,000 long tons 

(monthly average) 

World Almanac, 1939 36,130,000 gross tons 

Standard Trade & Securities, Statistical Bulletin 100,300 gross tons 

(daily average) 
Pentorfs Almanack 1940-41 41,114,000 net tons 

Which of these figures would you choose? Give reasons for your choice 
by consulting these several sources; explain as completely as possible the 
apparent discrepancies. 

6. From any issue of the Commercial and Financial Chronicle select a series 
of data that are of the by-product variety. Explain the major purpose for 
which the data were collected. Evaluate the data. 

7. Before using library data, what facts would you desire to know about 

a) the nature of the data themselves? Why? 

b) the types of units in which the data are expressed? Why? 

c) the organization collecting or preparing them? Why? 

d) the purpose for which they were issued? Why? 

e) the consumers to whom they are addressed? Why? 
/) the accuracy of the data? Why? 

g) the homogeneity of the conditions under which the data were col- 
lected or to which they refer? Why? 



228 BUSINESS STATISTICS 

8. Certain difficulties of collection occur in each of the following problems. 
Find as much information as you can in answering the question and 
explain the circumstances in the sources that make it difficult to secure 
complete and comparable data. (The instructor will assign one or more 
of the problems to each student, according to the time available.) 

a) An important measure of steel ingot production is "per cent of ca- 
pacity." Trace the changes in "capacity" since 1889. 

b) Compare the number of savings banks, depositors, and amount of 
savings in your own state with the United States as of recent date. 

c) Compare the changes in the number of employees in the carriage in- 
dustry and in the automobile industry at 10-year intervals beginning 
with 1900. 

d) What was the payroll of the executive branch of the United States 
government annually, 1929 to date? 

e) Compare the number of full-time employees in one-, two-, and three- 
store independent groceries in the United States with the number 
employed in chain grocery stores, in 1929 and in 1935. 

/) Select the five industries whose indexes of employment were lowest 
during the most recent month, and compare these indexes with their 
indexes in 1929 and in 1932. 

9. Could Table 21, page 168, and Table 28, page 233, be used for verification 
by cross reference? Give reason for your answer. 

REFERENCES 

LANE, MORTIMER B., How To Use Current Business Statistics. Washington, 
D. C.: Government Printing Office, 1928. 

The entire pamphlet should be read to obtain background knowledge 
concerning published statistics. Chapter IV is particularly valuable. 

SCHMECKEBIER, LAURENCE F., Government Publications and Their Use. 
Washington, D. C: The Brookings Institution, 1939. 

This book tells how to find government documents. It is useful to statis- 
ticians in locating less familiar publications. 



CHAPTER XI 
RATIOS 

THE IMPORTANCE OF RATIOS IN STATISTICS 

A" [ONG all statistical techniques none is so commonly used as 
the ratio. For instance we speak of having a national debt 
of $323 per capita; banks paying 2 per cent interest on sav- 
ings deposits; a retail merchant making a gross profit of 25 per cent 
on the cost of goods; sales 10 per cent above those of last year; a death 
rate of 11.0. Such ratios serve the twofold purpose of (1) simplifying 
data and (2) increasing their comparability. 

Ratios properly presented are so easily understood that an analysis 
of methods seems almost unnecessary. However, when the student 
changes from the role of a reader to that of a statistician who must 
transform primary data into ratio form he finds himself confronted 
with some problems. There are certain principles that determine the 
construction, presentation, and interpretation of statistical ratios. An 
exposition of these will form the content of this chapter and the next. 

CONSTRUCTION OF STATISTICAL RATIOS 

Statistical ratios are fundamentally the same as the ratios with which 
everyone becomes familiar in studying arithmetic. In chapter II on 
"The Use of Numbers," no particular mention was made of ratios, 
since from the point of view of arithmetic computations they are the 
same as any other fractions and are handled as such. However, since 
statistical ratios deal always with concrete values or quantities rather 
than with abstract numbers, certain modifications of the arithmetic con- 
cept of the ratio should be noted at the outset. These include the form 
of expression, the importance of the item used as the base, the number 
of units in which the base is expressed and the possibility of relations 
between unlike as well as like items. 

Form of Expression 

A ratio in arithmetic is the relation which one number or quantity 
has to another, its value being expressed as the abstract quotient of the 

229 



230 BUSINESS STATISTICS 

first divided by the second. The term "ratio" is applied either to the 
original fraction, to the quotient or to both stated together. In statistics 
the ratio relationship is always between two precisely defined concrete 
quantities or values. This relationship is simplified through the divi- 
sion of the first term by the second, but it is never expressed as an 
abstract quotient. Instead the value of a statistical ratio is the simplified 
value of the numerator expressed in relation to one or more units of the 
denominator. 

For example, the items used in the first ratio quoted were the na- 
tional debt of the United States of $42,558,875,571, March 31, 1940, 
and the population as of April 1, 1940, 131,699,275 persons. Dividing 
the first number by the second gives a quotient of 323, but this is not 
the statistical ratio. It must rather be stated as follows: "In 1940 the 
national debt of the United States was $323 per capita" or "$323 for 
every person in the United States." 

The qualifying descriptions of both numerator and denominator 
items must be either fully stated or clearly understood. When statistical 
ratios are listed in a table the exact specifications of both numerator 
and denominator are indicated in the table headings. 

Selection of Base 

In forming ratios between two abstract numbers either one may be 
used as the denominator or base, e.g., 5 -r- 20 = i or 20 -f- 5 = 4. 
Likewise an abstract quotient can be readily understood either in the 
form .0126, 1.26, or 126; hence there is no occasion for changing the 
number of units used in the base. However, in every statistical ratio 
the numbers represent definite concrete items and consequently two 
questions must always be considered: (1) Which item is the logical 
base? (2) In what number of units shall it be expressed? These two 
points must be discussed separately. 

The Item. The denominator of a statistical ratio is always a stand- 
ard to which the numerator is being compared. The two numbers each 
refer to concrete values or quantities whose characteristics require that 
one of them should be used as the standard in terms of which the other 
is to be measured. 

In some types of ratio construction it is immediately obvious which 
of the two items is the appropriate base: 

a) In a comparison between a part and the whole, the whole is 
always the base. 



RATIOS 231 

b) In time comparisons between a recent and a prior recording of 
like items, the prior event will almost always be taken as the base. 

c ) In a comparison between an effect and its cause or between two 
values or events one of which is at least partly dependent upon the 
other, the cause or the independent item is always the base. 

In certain other types of ratios the choice of item for the base 
depends upon the use that is to be made of the ratio: 

a) In comparisons between like totals or between two parts of the 
same total, either one may be selected as the base according to the 
emphasis desired. 

b) In various accounting ratios such as sales divided by inventory, 
custom has determined the form that is used. 

The Number of Units. The number of denominator units used as 
the base may be determined by custom, convenience, or effectiveness. 
Referring again to some of the first examples quoted in this chapter, 
the national debt is expressed in terms of one denominator unit so 
many dollars for each single individual in the population; an interest 
rate of 2 per cent means two dollars for every hundred dollars de- 
posited; the death rate indicates the number of deaths during a given 
period for every thousand persons alive at the beginning of the period. 

These examples illustrate the practice of expressing the numerator 
of a statistical ratio in terms of: (a) one unit of the base, (b) 100 
units of the base, (c ) other powers of 10 units of the base. 

One denominator unit as the base: There are many examples in 
which the base of a ratio is expressed as a single unit. All per capita 
ratios use one person as the unit of the base. In agriculture we use 
production per acre; in railroading revenue per ton-mile and per pas- 
senger-mile. The accountant uses a 2 to 1 ratio between current assets 
and current liabilities as a standard of liquidity, and such examples 
might be listed indefinitely. The expression of the numerator of a ratio 
in terms of one unit of the denominator as the base is accomplished by 
the application of a simple proportion in which x = the desired value 
for the numerator, 

numerator: denominator = x : 1 

The solution for x requires simply that the numerator be divided by the 
denominator. The result then becomes a simplified value for the nu- 
merator in terms of one denominator unit, similar to that determined 
for the national debt per capita. 



232 BUSINESS STATISTICS 

One hundred denominator units as the base: Most of the compari- 
sons made by the lay user of statistics are in terms of per cents. Thus 
we have a 5 per cent increase in grocery prices, a 3 per cent increase 
in bank deposits, the grades of a class of students 2 per cent below the 
average, a selling price which is 130 per cent of the cost, a wheat crop 
which is 85 per cent as great as last year, humidity of 75 per cent and 
so on. In each case the number stated as a per cent indicates how many 
numerator units there are for every hundred denominator units. 1 

An illustration of the method of expressing a ratio in terms of 100 
units as a base may be taken from Table 21, page 168. Column 1 
gives the number of telephones in use during each year from 1931 to 
1936. To find the ratio of telephones in use in 1936 to those in 1931 
the formula that was used before could be applied but the result would 

then be ' or .94 telephones in 1936 for every one in 1931. Since 

-L J j 



it is difficult to visualize .94 of a telephone, a base of 100 units instead 
of one should be chosen. 

14,454 : 15,390 = * : 100 



The numerator was divided by the denominator as before but the deci- 
mal point was moved two places to the right. The result may be stated: 
"There were 94 telephones in use in 1936 for every 100 in 1931" or 
'The number of telephones in 1936 was 94 per cent of the number 
in use in 1931." 

Other powers of ten denominator units as a base: Ten, 1,000, 10,- 
000, 100,000, or even larger numbers of units may be used in the base. 
An advertiser may state that four out of every ten refrigerators sold 
last month were "Evercolds," the intention being to express the prefer- 
ence for the Evercold product even more vividly than would be the case 
if the advertisement stated that 40 per cent were "Evercolds." The 
use of telephones is expressed in the form, "number of telephones per 
thousand population. In a published chart dealing with automobile 
fatalities the following ratios were presented: deaths per 10,000 cars 
registered, deaths per 100,000 population, deaths per 10,000,000 gal- 
lons of gasoline consumed. Fish hatcheries study the propagation of 
fish in units of 1,000,000 fingerlings planted. The hazards of different 

1 The construction and use of per cents may be reviewed by referring to chapter II. 



RATIOS 



233 



methods of transportation are expressed by comparing the number of 
deaths to the number of miles traveled in units of 100,000,000 miles. 

Similar usage appears in the field of vital statistics. The death rate 
for the United States in 1933 was 10.7 per 1,000 persons living in the 
entire area and the birth rate was 16.5 per 1,000 persons living in the 
area. When dealing with specific causes of death the base used is 
100,000. Thus the 1933 death rate from cancer was 102.2 per 100,000 
population in the United States, while the suicide rate was 15.9 per 
100,000 population. 

There are two rules that determine whether one or some higher 
power of ten units should be used as the base: 

1. The number used as the base should be large enough so that the 
value of the numerator will appear mainly as a whole number but will 
have not more than three digits to the left of the decimal point. In 
Table 28 the figures in column 4 are unwieldy. The rule for significant 
figures permits carrying these quotients to four or even five digits but 
one of the advantages of the use of ratios, simplicity, has been lost. 
The most effective form for these ratios is shown in column 3 in which 
the results appear as whole numbers of only three digits each. 

TABLE 28 

ESTIMATED NUMBER OF TELEPHONES TN USE IN THE UNITED STATES, ESTIMATED 
POPULATION, AND RATIOS OF THE Two AT FIVE-YEAR INTERVALS, 1920-35* 



YEAR 


(1) 
ESTIMATED 
NUMBER OF 
TELEPHONES 
(000 omitted) 


(2) 

ESTIMATED 
POPULATION 
(000 omitted) 


(3) 
TELEPHONES 

PER 1,000 

POPULATION 


(4) 
TELEPHONES 

PER 10,000 

POPULATION 


1920 


13,329 


106,543 


125 


1,251 


1925 


16,936 


114,867 


147 


1,474 


1930 


20,201 


123,091 


164 


1,641 


1935 


17 503 


127,521 


137 


1,373 













Statistical Abstract, 1936: Telephones, p. 344; Population, p. 10. 

2. The number used as the base should be smaller than the number 
in the original denominator; otherwise the ratio implies more stability 
than is warranted. That is, a per cent should not be based on fewer 
used because in each month the base is less than 100. For instance, in 
than 100 cases. A ratio expressed as so many per thousand should in- 

methods of transportation are expressed by comparing the number of 

Similar usage appears in the field of vital statistics. The death rate 

for the United States in 1933 was 10.7 per 1,000 persons living in the 



234 



BUSINESS STATISTICS 



June each failure accounts for 14 per cent; hence one less or one more 
failure in 1937 would have caused the ratio either to be doubled or 
reduced to zero. 

TABLE 29 

NUMBER OF BUSINESS FAILURES IN BUFFALO, NEW YORK, 
FIRST Six MONTHS OF 1936 AND 1937 



MONTH 


(1) XT (2) 

NUMBER OF 
FAILURES 


(3) 

PERCENTAGE 

OF CHANGE 
IN FAILURES 
(2)-HD 100% 


1936 


1937 


January 


12 
14 
3 
8 
4 
7 


2 
5 
7 
10 
11 
6 


83 

64 
4-133 
+ 25 
+ 175 
- 14 


February 


March 


April 


May 


June 





The same criticism applies to the data in Table 30. A 100 per cent 
distribution has been computed from 22 cases. The column of per cents 
immediately conveys the impression that at least 100 accidents were 
involved whereas it really means that if 100 accidents had occurred 
about 36 of them would have been caused by mine cars, the estimate 
being accurate only to the extent that a prediction can be based on an 
experience of 8 cases out of 22. The computation of per cents to two 
decimal places in this table is further cause for criticism. It is spurious 
accuracy because the transfer of one accident to a different class (the 
minimum change possible in the table) would result in a change of 4.5 
points in the affected ratios. For example, if there had been 9 fatal 
accidents due to mine cars and 9 in the miscellaneous class, then the 
per cent in each of these two classes would be changed to 40.91 per 
cent. Obviously there is no reason for carrying per cents to even one 
decimal place when they are based on so few cases. 

TABLE 30 

FATAL ACCIDENTS AMONG OUTSIDE WORKERS AT BITUMINOUS COAL MINES IN 
PENNSYLVANIA, CLASSIFIED BY CAUSE, 1924* 



CAUSE OF 

ACCIDENTS 


NUMBER OF 
ACCIDENTS 


PERCENTAGE 
DISTRIBUTION 
OF ACCIDENTS 




8 


36.36 


Railroad cars 


1 


4.55 


Electricity 


3 


13.64 




10 


45.45 








Total 


22 


100.00 









Pennsylvania Departmental Statistics (Commonwealth of Pennsylvania, Department of 
State and Finance, Harrisburg, Pennsylvania, 1925), p. 139. 



RATIOS 235 

Kinds of Ratios 

The basic definition of an arithmetic ratio includes the qualification 
that the two quantities must be of the same kind and expressed in the 
same unit of measure. It follows that whenever data are homogeneous 
they provide suitable material for ratio comparisons. However, before 
proceeding with the discussion of ratios between items of the same 
kind, an explanation is necessary concerning statistical ratios that are 
made up of unlike items. 

Ratios between Unlike Items. The possibility of such ratios in 
statistics was indicated in the first section of chapter VIII in the state- 
ment that one type of table may contain "several sets of information 
.... not expressed in the same unit, but they .... bear some rela- 
tion to each other" 

Arithmetic textbooks say, "We cannot express the ratio of a horse 
to a sheep," and "No ratio exists between five tons and 30 days." Yet 
even a brief experience in statistics shows that it is exactly such pairs of 
unlike items usually expressed in two different units that do provide 
the material for many statistical ratios in common use. Examples are: 
the rate of production per day or per acre, the income per capita, 
freight revenue per mile of railway, or bad debt losses per dollar of sales. 

Such ratios are permissible in statistics because, as previously noted, 
the statistical ratio is not an abstract quotient. Dollars of revenue are 
not actually divided by miles, nor bushels by acres. The statistical ratio 
is merely a simplified statement of a factual relationship that does exist 
in each case between numerator and denominator items. For example, 
the total number of bushels of wheat that is produced depends upon 
the total number of acres under production; hence it is justifiable to 
divide the first number by the second in order to arrive at a simpler 
figure which will indicate the average number of bushels produced 
per acre cultivated. 

Careful scrutiny is necessary in many cases in order to ascertain 
whether the items that are being compared are really like or unlike. 
This is particularly true of items measured in dollar value. The dollar 
appears to be the same unit whether it represents dollars of credit or 
dollars of sales; hence dollar values are readily combined in ratios. 
In other instances, a word such as "persons," "products," etc., may be 
used in both terms of a ratio, but unless the word is defined identically 
in the two terms the ratio is between unlike items and is subject to 
definite limitations. 



236 BUSINESS STATISTICS 

In the construction and use of ratios between unlike items whether 
expressed in the same or different units there are three points of caution 
to be observed. 

1. The numerator and denominator items although representing 
different objects or values must be identically defined in both time and 
space. 

Table 21, page 168, contains three different sets of data: tele- 
phones, messages, and revenue, each of the latter two being subdivided 
according to attribute. All three have the same time classification in 
the stub and since there is no space classification the spatial characteris- 
tic is identical for every item in the table, that is, each represents terri- 
tory served by the Bell Telephone System. In every horizontal row, 
therefore, the time and space characteristics are identical for all three 
sets of items and ratio relationships may be looked for between them. 
Thus in 1931 the ratio of the number of local messages to the number 

of telephones was ' ' or 1,475 messages per telephone; the 



200 
total revenue per telephone in the same year was^- - or $68 ; 



<ft i O^ 2. f\ 

the average payment per toll call in the same year was ' or $.33. 

Obviously one would not compare the number of telephones in 1931 
to the revenue for a different year nor if state data were available 
would the number of messages in New York State be compared to the 
total number of telephones in New Jersey. A complete table such as 
this one is not always available when single ratios are being used, but 
it is always possible to reconstruct the table headings in outline in order 
to test whether the unlike items being used in any given ratio relation- 
ships do conform to this rule of identical time and space for numerator 
and denominator. 

2. There must be a very definite relationship, causal or otherwise, 
between numerator and denominator. 

In each of the three ratios between unlike items quoted in the 
preceding paragraph the numerator is in some degree dependent upon 
the denominator item. The messages are dependent upon the tele- 
phones because the telephones must be used in transmitting the 
messages; operating revenue comes into existence only if telephone in- 
struments are in use; and the revenue from toll calls arises from the 
fact that the toll calls have been made. 

It is easy to assume ratio relationships in such cases as these without 



RATIOS 237 

giving enough care to the definitions of the items used. The use of 
general terms may be correct in some ratios, whereas in other cases 
that appear to be similar a more specific term is needed to bring out 
the desired relationship. For example, the ratio "population per auto- 
mobile registered" might be used in measuring traffic density, but if 
the standard of living is being measured the ratio should be "registered 
passenger automobiles to population/' 2 These two ratios also illustrate 
the point that the purpose for which a ratio is being used will deter- 
mine which of the two items is dependent upon the other. The fact 
that a certain item such as population is used as the base in the majority 
of the ratios in which it occurs does not prove that it will be the base 
item in every case. 

3. The relation between numerator and denominator must be cor- 
rectly expressed. 

A full and accurately worded statement of the relationship is im- 
portant in any type of ratio but especially so when the numerator and 
denominator represent different things. In particular, one is never a 
"per cent of" the other since in the case of unlike items one could 
not be any number of lOOths of the other. If the base is conveniently 
expressed in 100 units, the use of "per cent" is permissible, provided it 
is combined with "number," "value," or some corresponding expres- 
sion. For example, "The number of teachers is 20 per cent of the 
number of students," or "There are 20 per cent as many teachers as 
students," but certainly not "The teachers are 20 per cent of the stu- 
dents." An experiment to determine the effect of fertilizer upon wheat 
yield showed that the yield increased four bushels per acre when 100 
bushels of lime were spread per acre. This statement involves two 
ratios between unlike items. Clearly it would be incorrect to say there 
was a 4 per cent increase either of the lime or of the wheat. 

Ratios between Like Items. Statistical data are considered "like" 
if they are expressed in the same unit and differ with respect to only 
one characteristic, according to the classifications that were used in the 
original tabulation. They may be alike in all attributes and in time, 
differing only in space; they may be identical in attributes and in space 
but different in time; or they may be alike in both time and space and 
in all but one attribute. This last group can be distinguished from the 
"unlike" data discussed in the preceding section which are also identical 

* See chapter XII for further discussion regarding refinement of definition in the 
construction of such ratios. 



238 BUSINESS STATISTICS 

in time and space. Unlike items may even be expressed in the 
same unit of measure but they are differentiated from one another 
by separate definitions, which show that the items are in different 
categories. 

Referring again to the table of the Bell Telephone System, columns 
4, 5, and 6 may be considered separately as a two-way table of like 
items. Total operating income in the entire system is subclassified ac- 
cording to attribute, local and toll, and is cross-classified according to 
time. Thus in any one row, the data in columns 4, 5, and 6 are alike 
in time and space and differ only in the one attribute according to 
which operating income has been subdivided. They are, therefore, 
"like" and may be compared. We may say, for example, that in 1931 

the revenue from local service was ^- or 68.9 per cent of the total 

operating income. Similarly any figure in columns 4, 5, or 6 may be 
compared with another figure in the same column. They are alike in 
space and in attribute, differing only in time. Thus the toll revenue 
decreased from $326,300,000 in 1931 to $243,900,000 in 1933, a de 
crease of $82,400,000 or 25.3 per cent. 

Items that are listed under a single heading in a table are often 
potentially subject to further subdivision. For example, a single set of 
data headed "United States" might be subdivided according to the 
main geographic divisions, according to Federal Reserve Districts or 
according to the 48 states. "Total wage earners" might be subdivided 
into male and female. They might also be subdivided into age groups 
or by wage rates. The danger of comparisons between items that are 
too general in definition has already been noted in the case of ratios 
between unlike items, and the warning is equally applicable to ratios 
between like items. When subclassifications or refinements in definition 
are available, the maker of ratios should proceed with care before he 
looks for relations between general data that appear to be "like." 
However, his refinements can go no farther than the available data 
will permit. If, according to the classification used, the items are like 
in every characteristic but one, then they may be combined in ratios. 
But the possibility that the relations between such data might be af- 
fected by further subdivision of their characteristics must be kept 
constantly in mind in drawing conclusions from these ratios. This 
point becomes of special importance when comparisons are drawn 
between two or more ratios and will be discussed further in a later 
section. 



RATIOS 239 

In conformity with the fundamental classifications of data, a ratio 
between like items may be classified as a time, space, or attribute ratio 
according to the one respect in which the numerator item differs from 
the denominator. A second method of classifying a ratio is according 
to whether (1) the numerator item is a part of the denominator item; 
(2) the numerator and denominator are separate parts of the same 
total; (3) the numerator and denominator items are separate totals. 
The mechanics of ratio construction will be discussed according to 
these part-total relationships and in each case any differences in the 
treatment of time, space, and attribute ratios will be pointed out. 

Part-to-total: This type of ratio is used chiefly in space and attribute 
comparisons. Items that differ in time also become material for part- 
to-total ratios when they are of such a nature that they make up a 
cumulative total, as for example monthly production figures for a 
given year. The method of construction and use of part-to-total ratios 
is identical for all three types of comparison. 

Part-to-total ratios take two common forms: (1) the comparison of 
a single part to the whole and (2) a percentage distribution in which 
all the parts are shown as percentages of the whole or 100 per cent. 

Single Ratios: Examples of single part-to-total ratios are the per- 
centage of manufactured products in the state of Michigan produced 
in the Detroit area; the number of factory workers over 65 years of age 
per 1,000 factory workers and the number of high-school graduates 
entering college per 1,000 high-school graduates. In each of these ex- 
amples the part selected for comparison with the total is chosen to 
demonstrate a particular point and so far as that demonstration is con- 
cerned nothing need be known about the other parts of the total except 
that they exist. The first example was a spatial ratio and the others 
were attribute ratios. 

When part-to-total time ratios are to be constructed, a distinction 
must be made between series in which there is no overlapping between 
the separate items and those in which the quantities or values do over- 
lap. In the first type the separate parts can be added to make up a 
total for a longer period of time, as for example, the sum of the ex- 
ports for each of 12 months in a year will equal the total year's exports 
and it follows that the data may be used in part- to-total ratios. Series 
like this are quite different in nature from those which are recorded 
at similar time intervals but which represent overlapping quantities or 
values Such time series as number of employees, number of acres 



240 BUSINESS STATISTICS 

under production, population or assessed value of property cannot be 
added to form totals. Consequently no part-to-total ratios can be 
constructed from them. 

The telephone table (Table 21, page 168) may again be used to 
illustrate the contrast in these two kinds of time series. Columns 2 
and 3 show the number of messages of a certain kind that were trans- 
mitted during each year from 1931 to 1936. Here there is no over- 
lapping every message counted in 1931 is distinct from those counted 
in each of the other years. If there were any special significance in the 
six-year period, the number of messages of each kind could be totaled 
and the ratio of any one year to the total period could be used. 
Columns 4, 5, and 6 which show the operating income for each year 
likewise consist of non-overlapping items and could be treated in the 
same way. However, the items in column 1, the number of telephones 
in use during each year, cannot be added to give a total. They are 
obviously overlapping data, since most of the 15,390,000 instruments 
in use in 1931 are also counted among the 13,793,000 used in 1932. 
Some new ones have been added while some of the old ones have been 
disconnected and like changes have occurred every year. Since the 
separate items do not constitute a total, no part-to-total ratios can be 
made from them. The only possible ratios would be those between 
two single figures in the same column, that is, total-to-total ratios. Time 
ratios of this kind will be discussed in a later section. 

Percentage Distributions: The same types of data that are suitable 
for single part-to-total comparisons can be presented as percentage dis- 
tributions. This is a ratio technique that gives emphasis to the relative 
importance of each of the parts that make up a total. The several 
numerator items are each expressed in terms of 100 units of the same 
denominator, the denominator being equal to the sum of the numerator 
items. Table 31 shows the amounts loaned on non-farm mortgages by 
different types of lending institutions during five months of 1939 with 
a percentage distribution of the several items. The per cent column 
shows more clearly than the original data that savings-and-loan asso- 
ciations were the most important lending agencies during this period, 
and that banks and trust companies were second. The least business 
was done by insurance companies and mutual savings banks with 8.8 
per cent and 3.2 per cent, respectively, or only 12 per cent for the two 
combined. 

Table 32 presents a part-to-whole analysis which resembles the pre- 



RATIOS 



241 



TABLE 31 

NON-FARM MORTGAGE RECORDINGS IN THE UNITED STATES BY TYPE OF 
MORTGAGEE, FIRST FIVE MONTHS OF 1939* 



TYPE OF LENDER 


VALUE of 
MORTGAGE RECORDINGS 
(000,000 omitted) 


PER CENT 

OF TOTAL 
RECORDINGS 


Savings-and-loan associations 


$431.8 


30.1 


Insurance companies 


127.1 


8.8 


Banks and trust companies 


359.2 


25.0 


Mutual savings banks 


467 


3.2 


Individuals 


263 7 


18.4 


Others 


208.8 


14.5 








Total 


$1 437.3 


100.0 









* Federal Home Loan Bank Review, Vol. 5, No. 10 (July. 1939), p. 311. Federal Home 
Loan Bank Board, Washington, D. C. 

ceding one in form. At first glance it might appear to be another per- 
centage distribution in this case of items differing in space. The 
percentage living on farms varies from 2.4 in Rhode Island to 31.4 
in Vermont and a total of these per cents happens to be about 100, 
which might be assumed to represent the total for the New England 
and Middle Atlantic States. However, a closer inspection shows that 
this is not a percentage distribution but a series made up of the first 
type of part-to-whole ratios. The separate ratios are not comparisons 
in space but of attribute, i.e., residence on farms is an attribute of a 
part of the population of each state. Each of the ratios has been com- 
puted from a different base, the total population of that state; there- 
fore they cannot be added to give a total that has any meaning. The 
percentage of the total population living on farms in all of the states 
together must be computed from the total original data, the same as 
was done for each separate state. 

TABLE 32 

PER CENT OF TOTAL POPULATION LIVING ON FARMS IN NORTHEASTERN STATES, 

1930 CENSUS* 



STATE 


PFR CENT 
LIVING 
ON FARMS 


Maine 


21 4 


New Hampshire 


13 5 


Vermont 


314 


Massachusetts 


2 9 


Rhode Island . 


24 




5 4 


New York 


5 7 


New Jersey 


3.2 




8 9 







Statistical Abstract. 1936, p. . 



242 



BUSINESS STATISTICS 



Errors To Avoid in Percentage Distributions: Sometimes per cents of 
a total are quoted that do not amount to 100 per cent, usually due to 
some kind of carelessness. This error can be avoided if all the per 
cents are quoted in tabular form, including the 100 per cent total. 8 

Other examples, such as Table 33, may be found in which the total 
of a percentage distribution greatly exceeds 100, not because of error 
in the computations but because the table contains two distributions 
instead of one. In this case further confusion is added because in each 
of the two the data have been distributed in a double classification. 
Thus there is no clear distribution according to each characteristic 
separately. In the source from which Table 33 was taken the per- 
centages and primary data were all given in a single column which 
was even more confusing than in the form shown, since there was no 
indication which items together totaled 100 per cent. A more usable 
form for the data is shown in Table 34 which is practically equivalent 
to two separate tables. In this form comparisons are immediately 
apparent between the percentages normal and defective in the various 
categories. 

TABLE 33 

CLASSIFICATION OF DEFECTS BY SEX AND NATIVITY 
FOURTH-CLASS SCHOOL DISTRICTS, PENNSYLVANIA, 1917-18* 





NUMBER 


PER CENT 


Total male 


240,553 




Normal 


55,735 


11.5 


Defective 


184,818 


38.1 


Total female . 


244,455 




Normal 


63,858 


13.2 


Defective . . 


180,597 


37.2 


Total native 


464,034 




Normal 


115,671 


23.9 


Defective 


348,363 


71.8 


Total foreign 


20,974 




Normal 


3,922 


0.8 


Defective 


17,052 


3.5 









* Departmental Statistics (Commonwealth of Pennsylvania, Department of State and 
Finance, Harrisburg, Pa., 1925), p. 72. 

This same type of error may appear in a number of different forms, 
in all of which the mistake lies in the attempt to show too much in 
one distribution. Percentages of subtotals should not appear in the 
same column with percentages of the total distribution unless italicized 
or otherwise unmistakably distinguished. It is preferable to make 

8 For a discussion of significant figures in percentage distributions refer to 
chapter VIII, pages 168-69. 



RATIOS 



243 



several short tables, each showing one set of relationships clearly. 
Other isolated percentage relationships that do not warrant the con- 
struction of a special table may be pointed out in the text and in any 
such case the original data should be quoted along with the per cent. 

TABLE 34 

PUPILS IN FOURTH CLASS SCHOOL DISTRICTS IN PENNSYLVANIA: NUMBER AND PER CENT 
NORMAL AND DEFECTIVE ACCORDING TO SEX AND NATIVITY, 1917-18 





SEX 


NATIVITY 


Male 


Female 


Total 


Native 


Foreign- 
Born 


Total 



Normal 


55 735 


63,858 


119 593 


115,671 


3,922 


119,593 


Defective 


184818 


180,597 


365,415 


348,363 


17,052 


365,415 
















Total 


240,553 


244,455 


485,008 


464,034 


20,974 


485,008 

















PER CENT 



Normal 


23.2 


26.1 


24.7 


24.9 


18.7 


24.7 


Defective 


76.8 


73.9 


75.3 


75.1 


81.3 


75.3 


Total 


100.0 


100.0 


100.0 


100.0 


100.0 


100.0 

















One of the most frequent misuses of a percentage distribution re- 
sults from the inclusion of a miscellaneous class. Such a class may 
contain (1) items which are known to be distinct from those included 
in the separate classes or (2) items that are unknown or poorly 
defined. 

1. If a class is designated as "All other," "Others," or "Not else- 
where classified," it indicates that a number of less important classes 
of the distribution have been combined in order to conserve space, to 
concentrate the reader's attention on the important items or to avoid 
disclosing confidential information. The characteristics of all these 
other items are known and they definitely do not belong in any of 
the specifically named classes. No single class included among 
"Others" should be larger than the smallest class that is named sepa- 
rately, although the total of the combined "Others" may be greater. 
In Table 31, "Others" presumably includes endowment funds, non- 
profit institutions, etc., each of which is distinct from and less import- 
ant as a mortgage investor than the separately listed lenders of the 
table. Under such circumstances the information contained in the 
specific classes loses none of its accuracy by reason of the inclusion of a 



244 BUSINESS STATISTICS 

miscellaneous class. A percentage distribution that includes the 
"Others" as one of the parts of the 100 per cent total will therefore 
correctly represent the relation of each part to the total. 

2. In tabulating primary data it frequently happens that the answers 
to certain questions are missing from some of the collected schedules. 
Faulty questionnaire planning may likewise result in a group of poorly 
defined answers that cannot be classified precisely. Such cases must be 
grouped in an "Unknown" or "Not reported" class, although at least 
some of them should have been included in one or more of the known 
classes. The calculation of a percentage distribution with this unknown 
group as a component part of the total would therefore distort the 
true relation of each of the specific groups to the total. An alternative 
method of dealing with this situation will depend upon the circum- 
stances surrounding the collection of the data. 

a) If it can be assumed that the known cases comprise a repre- 
sentative sample of the total, the unknown group even if relatively 
large may be dropped and a percentage distribution computed of the 
total known cases. This is justifiable in any case if the unknown group 
is relatively small, since the omission of a few items from one or more 
groups will not materially affect the percentage relationships. A foot- 
note may be added stating the number of items omitted and what per 
cent they are of the total number investigated. 

b) If a large unknown group has resulted from some element of 
bias in answering the questions, the distribution of known items can 
not be assumed to be representative. In such cases no percentage dis- 
tribution should be computed and indeed the original data are of ques- 
tionable value. 

Table 30 illustrates a so-called miscellaneous class that is really un- 
known. The source from which the table was taken gave no direct or 
collateral information to indicate whether the ten accidents classified 
as miscellaneous were attributable to causes other than those listed or 
whether several of them were not allocated because of insufficient 
information. If the cases in the miscellaneous class are independent 
of the listed causes, then none of them should be more important than 
the listed causes. Since the table lists one accident involving railroad 
cars it would follow that ten different causes of one accident each are 
included in the miscellaneous class. While this situation is quite pos- 
sible it seems more likely that these ten accidents have been grouped 
in a miscellaneous class because of insufficient information to allocate 



RATIOS 245 

them. If this is the correct interpretation, then the entire table is worth- 
less because the allocation of these ten cases to specific causes might 
change completely the distribution of cases in the three classes. 

Part-to-part and total-to-total: Ratios between two like items neither 
one of which is a total including the other may be either part-to-part or 
total-to-total. From the point of view of ratio construction they 
may be considered together since there is no essential difference in 
method. 

In the case of space ratios the difference is only in the point of view. 
The areas of Canada and the United States may be regarded as sepa- 
rate totals or they may be two of the component parts of the total area 
of North America. From either viewpoint the area of Canada to the 
United States is in the ratio of 106 : 100, or it is 106 per cent as great 
as the area of the United States. 

In time series when the data are non-overlapping they may be re- 
garded either as separate totals or as parts of a larger total; if they do 
overlap they are always separate totals. However, the method of 
comparing one item with another or with an average or other standard 
is the same in either case. 

Attribute ratios may appear to be comparisons between separate 
totals but if they are made up of genuinely "like*' items a broader 
definition can be found under which they will range themselves as two 
component parts of a larger total. If the two items that are being 
compared can in no sense be regarded as mutually exclusive parts of a 
total, then they are not attribute ratios of like items, even though they 
appear to be expressed in the same unit. They are instead ratios be- 
tween unlike items and are subject to the limitations already mentioned 
under that head. For example, the results of a study of radio advertis- 
ing yielded the following sets of data: total number of persons inter- 
viewed; number who listened to a given radio program; number who 
bought the product advertised on the program. All three of these sets 
of data used the general unit "persons." This unit had been subdivided 
in two ways: listeners and non-listeners; buyers and non-buyers. Rela- 
tionships between listeners and non-listeners, buyers and non-buyers or 
listener-buyers and listener-non-buyers were genuine part-to-part ratios. 
But "total listeners" and "total buyers" were not mutually exclusive 
categories under the general term "persons." Hence for the purpose at 
hand they were unlike items. A ratio between them would have been 
valid only if the number of one group were in some way dependent 



246 BUSINESS STATISTICS 

upon the number in the other group, an assumption that would have 
been difficult to justify. 

Usefulness of Part-to-Part Ratios: Ratios between the several parts 
will frequently provide more ex^ct information than the ratios of each 
part to the total. In the field of vital statistics such ratios as the 
number of male births to the number of female births, foreign-born 
to native, urban to rural, and white to colored population are in 
common use. In these cases the corresponding part appears to be 
a more natural standard than the total of the two. Furthermore, 
the use of a small base emphasizes the degree of difference between 
the two parts more effectively than if each were compared with 
the total. 

The part-to-part ratio is equally advantageous in the field of 
business. Table 31 showed that out of every $100 of new mortgage 
loans $30 were made by savings-and-loan associations, and $25 by 
banks and trust companies. A part-to-part ratio would afford the more 
direct statement that only $83 was loaned by banks and trust com- 
panies for every $100 by savings-and-loan associations. Or a statistician 
employed in a mutual savings bank might state that for every $100 
loaned by that type of bank $925 was put out by savings-and-ioan 
associations. 

This example brings out the point that the purpose of such ratios 
is the comparison of one item to another as a standard of measure; 
therefore either item may be used as the base according to the emphasis 
desired. Whether the part-to-part relation has greater significance 
than part-to-total will also depend upon the emphasis needed in 
each case. This becomes especially important when two or more 
sets of such ratios are being compared, usually at different periods 
of time. 

Percentage Relation: Since part-to-part ratios as well as part-to-total 
ratios are usually expressed in terms of per cents, precise terms must 
be used in expressing either kind of ratio in order to avoid ambiguity 
or misstatement. Furthermore in stating a part-to-part relationship, 
one item is no more a "per cent of" the other than is the case with 
ratios between unlike items. The sales of chain grocery stores and 
of independent grocery stores in a community for a given year might 
appear as follows: 

Chain-store sales $250,000 

Independent-store sales 200,000 



RATIOS 247 

If a statement were made, "The independent-store sales were 80 per 
cent," this could be taken to mean 80 per cent of the two combined. 
"The independent-store sales were 80 per cent of the chain-store sales" 
would imply that independents were a part of the chains instead 
of an entirely different type of grocery. "The relation between the 
two is 80 per cent" fails to indicate which one is used as the standard. 
The following are some of the correct statements that can be made: 
"Independent-store sales were 80 per cent as great as chain-store 
sales;" "Sales in the chain stores amounted to 125 per cent as much 
as the independent-store sales." 

Percentage Difference: The relation between two parts is very fre- 
quently expressed as a percentage difference and may be computed by 
either of two methods: (1) by subtracting 100 per cent from the 
percentage relation, computed on either item as base or (2) subtracting 
the item selected as base from the numerator and dividing the re- 
mainder by the base item. Due regard must be taken throughout for 
algebraic signs according to either method. Using the same example 
of grocery store sales: 

80 per cent 100 per cent = 20 per cent 

or 200 250 = 50 and """" = .20 or 20 per cent. 

250 r 

Again the wording must be precise and the base must be clearly 
indicated. "The difference between chain-store sales and independent 
sales was 20 per cent" does not tell which type of store has been used 
as the base or which had the greater sales. "Sales in independent 
stores were 20 per cent less than in chain stores" is a much clearer 
statement; or, if independent stores are selected as the base, "Sales in 
chain stores were 25 per cent greater than in independents," or "ex- 
ceeded sales in independents by 25 per cent." Note that whenevei 
the base is changed the percentage difference will change in amount as 
well as in direction. 

Precision of statement is particularly necessary when the part-to-part 
or total-to-total ratios are time relationships. Differences between two 
items that are identical except in time are best expressed as per cents 
of positive or negative change, or per cents of increase or decrease, the 
methods of computation being the same as for deriving percentage dif- 
ference in space and attribute ratios. 

Table 35 provides examples of a number of time ratios in each 
of which an item in October, 1937, is compared with an identically 



248 



BUSINESS STATISTICS 



defined item in October, 1936. All of these examples are total-to-total 
comparisons rather than part-to-part, since the two months compared 
are corresponding parts from different years instead of parts of the 
same year. The first four indicators are non-overlapping series but 
the fifth represents overlapping data. Despite this difference the same 
kind of wording can be used in reading all of the ratios in column 4: 
"In October, 1937, the production of steel ingots showed a 25.2 per 
cent decrease in comparison with the same month of the previous year"; 
"The number of cotton spindles active in October, 1937, showed but 
little change since October a year ago, an increase of only .3 per cent." 

TABLE 35 
INDICATORS OF BUSINESS ACTIVITY, OCTOBER, 1937 AND OCTOBER, 1936* 



BUSINESS INDICATOR 


(1) (2) 

AMOUNT OR VALUE 


(3) 

PERCENTAGE 
RELA- 
TION 

X 100 


(4) 
PERCENTAGE 
OF CHANGE 

(1) X loT 

OR 

(3) 100% 


Oct., 1936 


Oct., 1937 


Steel ingot production (thous. 
tons) 


4,534 
44,274 

43,321 

226 

23,662 


3,393 
107,216 

40,040 
202 

23,724 


74.8 
242.2 

92.4 

89.4 

100.3 


25.2 
4-142.2 

- 7.6 
10.6 

+ -3 


Domestic auto, sales (Gen. Mot.) 
(number) 


Bituminous coal production 
(thous. tons) 


Building contracts (mill, dollars) 
Cotton spindles active (thous. 
spindles) 





* The Annalist, Vol. 50, No. 1295 (November 12, 1937), pp. 796-97 and Vol. 50, No. 1296 
(November 19, 1937), pp. 836-37. New York Times Co., New York. 

If in Table 35 the comparisons had been with the previous month 
the first four might be regarded as part-to-part ratios but the wording 
would be no different except that September, 1937, would be named 
instead of October of the preceding year. Frequently the period used 
as a standard in constructing such ratios is not clearly indicated. A 
newspaper headline may read, "Department-Store Sales Jump Seven 
Per Cent in August," but careful reading of the article discloses that 
the 7 per cent gain was not since July of the same year, as one might 
assume, but since August of the preceding year. In this connection 
it should be noted that in comparing two time ratios, both of which 
are based on the same previous standard as 100 per cent, these ratios 
or "index numbers" are handled exactly as if they were primary data. 
That is, the relation or difference is found by dividing one by the 



RATIOS 249 

other, not by subtracting one index from the other. In the example 
mentioned the index of department-store sales rose from 83 in August, 
1938, to 89 in August, 1939, an increase of 6 points on a base of 83 
which is at the rate of 7 per 100 or 7 per cent. 

Distinguishing between Percentage Relation and Percentage Differ- 
ence: There is seldom any difficulty in distinguishing between percent- 
age relation and percentage difference in either space, attribute, or time 
ratios provided the difference is less than 100 per cent. When the 
difference is large, it is easy to forge f that one item must be subtracted 
from the other to obtain the percentage difference. Using the previous 
example of grocery-store sales, suppose that two years earlier the sales 
of the chain stores were only $25,000. Then the sales for the later 
year divided by the sales for the earlier year equal + 10 or + 1000 
per cent. The sales in the later year therefore were 1000 per cent 
as great as the sales in the earlier year. To obtain the difference, 100 
per cent must be subtracted, leaving an increase of 900 per cent. Or, 
find the difference between the two years, that is, the later or base 
year minus the earlier year and divide by the earlier year: $250,000 
$25,000 = +$225,000 +$25,000 = +9 or 900 per cent increase. 
Further illustrations can be found by comparing column 3 with 
column 4 in Table 35. 

As has already been indicated the base item in time ratios is prac- 
tically always the earlier period. Failure to observe this rule leads 
to still further confusion in the expression of percentage relation or per 
cent of increase or decrease, as illustrated in the following quotation: 

Making Hilarity Pay. The large majority of the bootleggers have now 
cut their prices from 200 to 300 per cent in a desperate effort to meet the 
competition of the State Liquor Stores. (Newspaper clipping.) 

The reader would assume from the word "cut" that an earlier period 
had been used as the base. Whatever the former price may have been, 
a cut of 100 per cent would reduce it to zero, hence any greater decline 
would mean that the bootleggers were paying the purchasers to take 
their wares. A decline or decrease can never exceed 100 per cent. 
Very likely what happened was that liquor formerly selling at $3.00 
per quart was reduced to $1.00 or $.75. The difference of 200 or 300 
per cent was found by using the later period as the base of the ratio. 
Assuming that the present price is $.75, the method should have been 
as follows: $.75 -f- $3.00 = .25 or 25 per cent. Subtracting 100 per 



250 BUSINESS STATISTICS 

cent leaves 75 per cent. Thus the present price is i the former price 
or has decreased 75 per cent. There were two errors in the quoted 
statement: (1) the later instead of the earlier year was used as the 
base in computing percentage change and (2) the difference was 
incorrectly interpreted as percentage decrease instead of the per cent 
by which the past exceeded the present. There are a few occasions 
on which a later period may be used as the base, as when we say 
that the output of a plant was 10 per cent higher last year than this, 
or pre-war prices were 20 per cent below the current level, but 
examples of this kind occur so infrequently that they probably would 
better be disregarded entirely because they tend to confuse the unwary. 



PRESENTATION OF RATIOS 

Considerable attention has been devoted to proper presentation, 
both in text and tabular form, during the discussion of the construc- 
tion of various kinds of ratios. These rules need only be reviewed 
briefly together with some reference to chapter VIII and with certain 
additions regarding ratio presentation in general. 

In Text 

The following points should be observed in any textual reference 
to ratios: 

1. The exact scope of both numerator and denominator should be 
fully defined unless very clearly understood. 

2. The expression of each ratio should be precisely and accurately 
worded according to the kind of relationship involved, leaving no 
possibility of misunderstanding. 

3. If a ratio that does not actually appear in an accompanying 
table is used in the text, the data from which the ratio is derived 
should be quoted along with it. 

In Tabular Form 

The following rules will be a guide in tabular presentation: 
1. The rule of definite and adequate headings in presenting pri- 
mary data in tables also applies to ratios. If the original data are not 
included in the table the numerator and denominator items as well as 
the direction of relationship between them must be clearly defined in 



RATIOS 251 

the table. If the data are listed in parallel columns, they may be 
referred to by column number, as in Table 35. 

2. A separate derivative table should be made for each type of 
ratio comparison drawn from a given set of primary data. In particular, 
percentage distributions according to both horizontal and vertical classi- 
fications or according to more than one category should not be presented 
in a single table. 

3. Every percentage distribution should include a 100 per cent total 
and the separate per cents must add to 100 except for having been 
rounded off according to the rule for significant figures. Carrying 
per cents to too many decimal places gives a false impression of 
accuracy. 

4. Percentages of difference or change must be clearly indicated 
as positive or negative. 

5. Whenever possible, the data from which ratios have been 
derived should be shown along with the ratios. 

Importance of Including Original Data 

The last-named point is of importance in presenting any type of 
ratio. In a complete presentation of any subject the original data will, 
of course, appear in primary tables and need not necessarily be 
repeated in every derivative table. In a small summary table, how- 
ever, there is little danger of too great complication if the data and 
ratios are arranged in parallel columns. For example, in Table 32 
the meaning of the percentages would have been much more evident 
had two additional columns been given as follows: "Number of 
Families in State" and "Number of Families Living on Farms/' 

It should be remembered that the reader is rightly skeptical in 
accepting any statement of relationships that he cannot verify by 
making the computation himself. Table 36 contains a number of 
errors, but because the original data are given along with the ratios, 
it is possible for the reader to detect the errors and to correct them 
as well as to add his own interpretation. 

Two of the errors in Table 36 are typographical, such as may often 
be found as a result of lack of careful proofreading. One of these 
must be discovered in order to avoid misinterpreting the percentage 
distributions (note also the incorrect caption of this column), whereas 
the other is less serious. The first line of the distribution for June 30, 
1933, as printed is really the first line of the distribution for June 30, 



232 



BUSINESS STATISTICS 



TABLE 36 

AMOUNT OF PUBLIC DEBT DUB BEFORE AND AFTER JAN. 1, 1939, EXCLUDING 

PRE-WAR, POSTAL SAVINGS AND UNITED STATES SAVINGS BONDS AND 

SECURITIES ISSUED EXCLUSIVELY TO GOVERNMENT AGENCIES AND 

TRUST FUNDS 



DIVISION 


AMOUNT 
(IN 
MILLIONS) 


P.C. 

OF 

TOT. 


JUNE 30, 1932 
Due before Jan. 1 1939 


$10 870.7 


602 


First Liberty bonds (1947), called 1935 


1 933 2 


10 7 


Due after Jan 1 1939 


5 258 8 


29 1 


Total 


$18 062 7 


100 








JUNE 30, 1933 
Due before Jan 1 1939 - 


$13 4584* 


53 3t 


First Liberty bonds (1947), called 1935 


1 933 2 


9.2 


Due after Jan. 1 1939 


5 215.9 


24.8 








Total 


$21,028.4 


100.0 








JUNE 30, 1934 
Due before Jan. 1, 1939 


$3,458.4$ 


53.3t 


First Liberty bonds ( 1947), called 1935 


1 933.2 


7.7 


Due after Jan. 1, 1939 


9,861.2 


39.0 








Total 


$25,252.8 


100.0 








JUNE 30, 1935 
Due before Jan. 1, 19^9 


$10,0008 


38.3 


First Liberty bonds ( 1947) called 1935 






Due after Jan. 1, 1939 


16,093.9 


61.7 








Total 


$26,094.7 


100.0 









* Should read $13,879.3. 
t Should read 66.0. 

* Should read $13,458.4. 

1934. As indicated in the footnote, it should read: Due before Jan- 
uary 1, 1939, $13,879-3; 66.0. The error can be discovered by noticing 
that the percentage distribution for 1933 as printed adds to 87.3 per 
cent instead of 100 per cent. 

The second error occurs in the first line of the distribution of 
June 30, 1934, in which the amount is printed as $3,458.4 instead of 
$13,458.4. This error becomes obvious after the first one has been 
detected. If neither error were discovered, one would naturally read 
the table to mean that ten billion dollars of bonds had been retired 
or replaced by longer maturities during the fiscal year 1933-34. The 
corrected figures show that the reduction was only 421 million dollars. 
In this case there are more errors in the original data than in the 
per cents but they provide a check on each other. When ratios alone 
appear it is impossible to determine where the error lies. Except in 
the case of distributions totaling 100 per cent, the existence of error 



RATIOS 253 

would not be apparent unless the figure obviously disagreed with 
known conditions. 

Referring again to Table 36, if the column giving the amount 
of the total debt were omitted, it would be very difficult by reading 
the per cents alone to comprehend the changes that have taken place. 
For example, the decline in the per cent of the debt due before 1939 
from 66 per cent in 1933 to 53 per cent in 1934 might be ascribed 
to refunding operations during the year. The amounts show that 
the decline in per cent of early maturity was caused mainly by an 
increase in the total debt resulting from the addition of some four and 
one-half billion of longer term bonds maturing after 1939. In contrast 
to this the further decline from 53 per cent in 1934 to 38 per cent in 
1935 of bonds maturing before 1939 was the joint result of a decline 
of about three and one-half billions in short-term maturities and an 
increase of over six billions in longer-term maturities. Thus we see 
how essential the amounts are in arriving at the proper interpretation 
of the changes in the per cents. 

Sometimes additional relationships can be derived from a given 
set of data. If the original data are not shown the reader is prevented 
from working out ratios which may be of more interest to him than 
those selected by the author. Full presentation of the original data 
is therefore evidence of good faith on the part of the author. The 
reader is free to check every statement and to work out his own 
interpretation. 

COMPARISONS BETWEEN RATIOS 

Large and unwieldy figures are reduced to ratio form chiefly 
because comparisons between two or more such ratios can be easily 
interpreted. Many relationships that are entirely obscured in the 
original data can be brought out through the correct use of compari- 
sons between ratios. In fact a comparison is so implicit whenever 
two or more related ratios are presented together that up to this point 
in the chapter it has been impossible to confine the discussion to single 
ratios and not to anticipate to some extent the relations that exist 
between the several ratios. 

Kinds of Comparisons 

These comparisons between ratios group themselves into two distinct 
types: (1) those between several ratios in a single series, all of which 



254 BUSINESS STATISTICS 

have the same base, and (2) those between two or more separate 
ratios, in each of which the base is a different quantity or value. The 
first kind of comparison, with a very few exceptions, involves ratios 
between like items while in the second the ratios compared may be 
made up of like or unlike items. The methods used in the two kinds 
of comparison will be discussed separately. 

Ratios on the Same Base. There are two kinds of series in which 
several ratios are computed on the same base: (a) percentage dis- 
tributions of parts of a total and (b) index numbers in which successive 
values in a time series are expressed as per cents of an earlier year or 
some other normal base period. The primary purpose in presenting 
either of these types of ratios in a series is to show the importance of 
each individual item in relation to the base of 100 per cent. Again 
it should be emphasized that the construction of either type of series 
presupposes the homogeneity of the data. In a percentage distribution 
the separate parts must comprise a unified whole and in a time series 
the successive values must be identically defined from period to period. 

Percentage distributions: As was illustrated in Table 31, the expres- 
sion of each part as a per cent of the total makes it easy to estimate the 
relative importance of each part. Stating the several per cents, along 
with the 100 per cent base, usually expresses the comparison sufficiently 
without any further computation. The difference between any two 
such ratios may also be expressed by subtracting one from the other 
provided the relation of this difference to the 100 per cent base is clearly 
indicated. In Table 31, for example, "Only 5 per cent more of the 
total mortgages was held by savings-and-loan associations than by 
the group next in importance, banks and trust companies." If a direct 
relation between any two items had been of main importance without 
reference to the total, the percentage distribution need not have been 
made. It would have been simpler to divide the two items of original 
data. However, if only a percentage distribution is available without 
accompanying data, dividing one per cent by the other will give the 
relation between the two items, since the identical denominators cancel 
out. To express the importance of savings-and-loan associations rela- 
tive to banks and trust companies: ^ ' -4- ' is equivalent to 
401 o 1,437.3 1,437.3 

^ '; therefore dividing the two quotients, 30.1-7-25.0, gives the 

same result, 120 per cent. This operation, strictly speaking, should 
not be considered as a comparison of ratios, but merely as a substitu- 



RATIOS 233 

tion of the percentages for the original items in order to derive a simple 
ratio between them. 

Index numbers of time series: The emphasis in this kind of series 
also is on the relation of each item to the base rather than on direct 
relations between any two of the several items. Since indexes are apt 
to be more readily available than the original data and comparisons 
may be wanted between two specific periods instead of comparing 
either one with the base period, there is frequent occasion for express- 
ing comparisons between two individual index numbers. The pro- 
cedure is exactly parallel to that already described for percentage 
distributions: the two values may be subtracted provided the dif- 
ference is stated as a per cent of the base period; or, as has already 
been mentioned in the discussion of percentage increase or decrease, 
the second may be divided by the first, the same as would be done 
with the original items. 

Ratios on Different Bases. In the preceding section dealing with 
ratios on a common base the comparisons of such ratios were made 
in the same direction as the computation of the original ratios. That 
is, the numerator items differed from their base with respect to a 
single characteristic and the comparisons between the ratios were 
concerned with these same differences. When, for example, the char- 
acteristic was time in years, the various ratios computed on a base 
year were compared only with respect to their differences from this 
base year. No additional differences of any kind were introduced in 
the comparisons. 

Comparisons between ratios that are on different bases involve more 
complex relationships since they are always cross-comparisons of the 
original ratios. The ratios compared may be made up of like items 
or of unlike items and in either case the comparisons will be concerned 
with differences in a characteristic that was not involved in the separate 
ratios. 

Ratios of like items: Classification according to Characteristic: Since 
single ratios between like items are classified as time, space, or attribute, 
the comparisons between such ratios according to a second character- 
istic become a cross-classification of these three kinds of characteristics. 
Figure 33 presents this cross-classification with an example of each 
kind of comparison. Each of the three main groups of comparisons 
between ratios includes the three kinds of single ratios according to 
characteristic, with the exception of space comparisons of space ratios. 



256 



BUSINESS STATISTICS 



If the data for any of the examples in Figure 33 were set up in 
tabular form, it would be clear that since each ratio has a different 
base, there are no constant terms in the ratios that are being com- 

FIGURE 33 

CLASSIFICATION OF COMPARISONS BETWEEN RATIOS 
OF LIKE ITEMS WITH EXAMPLES OF EACH 



KIND OF 
COMPARISON 

BETWEEN 

RATIOS 


KINO OF 
SIMPLE 
RATIO 


EXAMPLE 


Time 


1. Time 
2. Space 
3. Attribute 


Ratios of the amount of magazine advertising in December 
to the amount in November, compared for a period of 

years. 

Ratios of the production of steel in Buffalo to the produc- 
tion in Cleveland, compared for each of 12 months of a 
given year. 

Percentage distributions by economic classes of the total 
value of United States exports, compared annually for 
several years. 


Space 


4. Time 

5. Space 
6. Attribute 


Ratios of the indexes of department-store sales for Decem- 
ber of a given year to December of the preceding year, 
compared by Federal Reserve Districts. 

(This combination is impossible because there cannot be 
cross-classification of spatial characteristics.) 

Ratios of low-priced car sales to total passenger-car sales 
in a given year, compared by main geographic divisions of 
the United States. 


Attribute 


7. Time 
8. Space 

9. Attribute 


The percentages of increase or decrease in value of United 
States exports over a period of years, the changes being 
compared by economic classes. 

The percentage of sales in a given sales district to total 
sales by a wholesale hardware company during a given 
year, compared by types of product distributed by the firm. 

Ratios of cash to installment sales in a given month, com- 
pared by departments, in a large department store. 



pared. In each case, however, there are certain points in common 
between the ratios which allow valid comparisons to be made be- 
tween them. 

Tests of Comparability: Most important of these is the fact that 
the ratios are "like" in the same sense that original items are "like." 
That is, they are identically defined except for one characteristic. The 
numerators of the ratios being compared are like and their denotni- 



RATIOS 



237 



nators are like, each set differing according to an identical classification 
which becomes the classification of the comparisons between the 
several ratios. 

A table showing the ratios used in Example 1, Figure 33, and in- 
cluding the original data, demonstrates the method by which the 
"likeness" of numerators and of denominators in any case may be 
determined and the consequent possibility of drawing comparisons 
between the respective ratios. In Table 37, the original ratios are each 
between two months in a given year, December to November, and 
these ratios are compared for several years. The headings and the 
title, as explained by notes in the original source, indicate that in each 
month and from year to year throughout the period the magazine 
lineage is measured by an identical method. For each month the data 
represent from 80 to 85 per cent of all magazines in the United States, 
the reports being compiled regularly by Printers 9 Ink. The numera- 
tors in column 2 are therefore identically defined, and likewise the 
denominators in column 1. Since the several numerators differ from 
one another only according to the stub classification, time in years, 
and the corresponding denominators follow the same classification, the 
resulting ratios in column 3 differ only according to this same char- 
acteristic. Hence the ratios are "like" and it is justifiable to draw 
comparisons between them: for example, to observe that for every 
year during this period, except 1935, the lineage has been smaller in 
December than in November, the declines ranging from .2 per cent 
to 26.3 per cent. 

If there has been any change in definition the data cannot be given 
under a single column heading without an explanation of the change, 

TABLE 37 

CHANGES IN MAGAZINE ADVERTISING LINEAGE 
NOVEMBER TO DECEMBER, 1933-38 * 

(thousands of lines) 



YEAR 


(1) 

NOVEMBER 


(2) 

DECEMBER 


(3) 
PER CENT DIFFERENCE 
<2)-Kl)-100% 


1933 


1 899 


1 791 


5.7 


1934 


2,317 


2,136 


7.8 


1935 


2 201 


2,334 


+ 6.0 


1936 


2,736 


2,731 


.2 


1937 


2,989 


2,893 


3.2 


1938 


2,251 


1,658 


26.3 











* United States Department of Commerce, Survey of Current Business, 1936 Supplement, 
p, 24; 1938 Supplement, p. 25; September. 1939, p. 23. 



258 



BUSINESS STATISTICS 



in a footnote, which will indicate that the data are not comparable. 
In Example 2, Figure 33, if the figures for steel production included 
only the city limits for the first few months and the metropolitan dis- 
trict for the remaining months, this fact would be noted and the 
resulting ratios could not be compared throughout the year. 

The following example illustrates how an invalid comparison may 
be made due to a concealed change in definition which could have 
been discovered if the data had been tested by a tabular analysis similar 
to Table 37. This statement appeared in the editorial columns of a 
city newspaper: 

CAMPUS-SHY CUPID 

X.Y.Z. University is no matrimonial bureau, say alumni officials of that 
institution. A recent survey [1937] of the classes from 1928 to 1935, in- 
clusive, shows that fewer than half of the coeds graduated from X.Y.Z. in 
those years have married. Of the alumnae who answered the questionnaire, 
the following percentages reported they had married: 1928, 54.3 per cent; 
1929, 46.8; 1930, 42.5; 1931, 41.4; 1932, 34.7; 1933, 30.2; 1934, 20.3; 
1935, 12.7. 

In tabular form, the data might have been somewhat as follows: 



YEAR OF 
GRADUATION 


(1) 

NUMBER 
GRADUATING 


(2) 0) 
GRADUATES MARRIED BY 1937 


Number 


Percentage of Total 


1928 


300 
350 
400 
300 


163 
164 
170 
207 


54.3 
46.9 
42.3 
41.4 


1929 


1930 


1931 


etc. 



A study of the stub and column headings shows that since all of the 
reports were made in the same year, 1937, there is no time comparison 
between the ratios. There was a time difference in the year of grad- 
uation, and this classification in the stub gives the appearance of a 
time comparison between the several rows. However, the heading of 
columns 2 and 3, "Married by 1937," indicates that in order to get 
the true definition of the terms of the ratios, the date of each class 
must be subtracted from this fixed date. The result becomes a classi- 
fication by attribute the number of years since graduation, or the 
successively shorter periods during which each class has been exposed 
to the "hazard" of marriage. The ratios do show that for each addi- 



RATIOS 259 

tional year since graduation a larger percentage of the members of 
any class will have married, but this fact scarcely requires proof. If a 
time comparison between ratios is desired, all the numerator items 
must be like in attribute, but the time of each ratio must be different. 
That is, each class must have been out of college for an equal length 
of time, the comparisons being made in successive years. If the ques- 
tion had been "Married within five years after graduation," then the 
resulting ratios would have been made in 1933, 1934, etc. Such data 
would probably give no clear evidence of either increase or decrease 
in the percentage of college women married. 

Another example will show that any violation of the previously 
discussed principles governing the construction of individual ratios 
will destroy the significance of comparisons between them. One of 
these principles stated that the possibility of further subdivision of 
data must be kept in mind before combining in ratio form two general 
totals that appear to be "like" in definition. The invalidity of such a 
ratio may not be apparent until it is compared with one or more 
similarly constructed ratios, with results that are contrary to known 
facts. A research bureau collected from 30 manufacturing and whole- 
sale concerns monthly data on the value of outstanding accounts and 
overdue accounts. From the totals of these 30 reports the ratio of 
overdue to outstanding accounts was computed as of the first of each 
month. However, when the July, 1937, ratio showed a noticeable 
increase over June, several of the concerns complained that the true 
situation was being misrepresented. 

Table 38, giving the collected data and accompanying ratios, shows 
what happened as a result of combining diverse elements in a single 
total. The ratio of overdue to outstanding accounts for all 30 firms 
increased from 20.4 per cent to 22.7 per cent between June and July, 
1937, due to a decrease in the denominator. This decrease in the total 
outstanding accounts can be charged entirely to the 6 food concerns. 
Their outstanding accounts showed a drop so great that it more than 
counteracted the slight increase shown by the other 24 concerns. The 
ratios of the food concerns were quite different from the other 24 in 
both months. The 6 food concerns were subsequently reported sepa- 
rately, thus eliminating dissatisfaction. 

In this case, the numerator of each ratio consisted of 30 parts, each 
of which had a very definite relation to a corresponding part of the 
denominator. However, the 6 food concerns were so different from 



260 



BUSINESS STATISTICS 



TABLE 38 

OVERDUE AND OUTSTANDING ACCOUNTS OF THIRTY CONCERNS 

FOR JUNE AND JULY, 1937, AND THE RATIO OF OVERDUE TO 

OUTSTANDING ACCOUNTS 



TYPE or CONCERN 


MONTH 


OUTSTANDING 
ACCOUNTS 


OVXRDUX 

ACCOUNTS 


OVERDUE + 
OUTSTANDING 
(%) 


6 food 


Tune 


$173,901 


$ 61,780 


35.5 


24 manufacturing 


July 
Tune 


133,712 
822,516 


70,904 
141,307 


33.0 
17.2 


30 combined 


July 

June 


836,410 
996,417 


149,212 
203,087 


17.8 
20.4 




July 


970,122 


220,116 


22.7 



the other 24 that when all were combined the result was a hetero- 
geneous total that did not correctly represent either of the component 
groups. Consequently, neither the individual ratios between these 
totals, nor the comparisons between the ratios, could be given any 
definite interpretation. The only way to analyze such a situation is to 
assemble the data, part by part, and to study each individual relation- 
ship in order to discover which totals or subtotals should be used in 
deriving ratio comparisons. 

Interpretation According to Kind of Relationship: The eight exam- 
ples in Figure 33 afford illustrations of all kinds of ratios according 
to relationship: part to part, total to total, and part to total. Numbers 
1 and 4 are comparisons between part-to-part time ratios, the first 
being ratios between two corresponding parts of the same year com- 
pared for several different years, and the fourth being ratios between 
corresponding parts of two different years compared in different areas. 
In Number 9 the ratios are between two component parts of total 
sales, compared according to a second attribute, the different depart- 
ments of the store. The ratios in Number 2, compared for several 
successive periods, are between two cities in the United States and may 
therefore be regarded either as part-to-part or total-to-total rela- 
tionships. 

Numbers 6 and 8 are examples of comparisons between a set of 
part- to- total ratios and Table 38 illustrates the same kind of com- 
parison. In this type of comparison there are as many separate de- 
nominators as there are ratios being compared, all identically defined 
but not identical in amount. Reduction to ratio form has in one sense 
placed all the numerators on a common base of 100 per cent, but it 
must be remembered that the numerators are now relative instead of 



RATIOS 261 

absolute amounts and hence are not subject to further computation 
with the same freedom as were the original data. Such cases need to 
be carefully distinguished from comparisons between the several parts 
within a single percentage distribution. In the discussion of ratios 
on the same base it was demonstrated that such ratios could be sub- 
stituted for the original data in making computations. This is no 
longer true when the bases are different, since they do not cancel out 
when one ratio is divided by the other. 

Comparisons between several series of ratios, each series having a 
single base, present the same situation as sets of single part-to-total 
ratios except that they are more complex and afford greater oppor- 
tunities for misinterpretation. Numbers 3 and 7 from Figure 33 are 
examples respectively of comparisons between several percentage dis- 
tributions and between several time series, both examples having been 
derived from the same set of original data. 

Part A of Table 39 shows original data on United States exports 
cross-classified according to four years in time, and according to five 
subdivisions of the attribute, economic class. In Part B, several per- 
centage distributions according to economic class are given, one dis- 
tribution for each of the four years. These may be interpreted together, 
somewhat as follows: Finished manufactures comprise the largest 
share of United States exports and have been increasing in relative 
importance throughout the period, from 41.8 per cent in 1934 to 49.0 
per cent in 1937. Crude materials, the group second in importance, 
and manufactured foodstuffs have each meanwhile formed a continu- 
ally diminishing share of the total exports. Semi-manufactures and 
crude foodstuffs have maintained a fairly constant relative importance, 
except for the increase in semi-manufactures in 1937. 

If a table such as this one were read too hastily, the horizontal rows 
of per cents might easily be mistaken for index numbers on some 
earlier base. With this misinterpretation the first row would indicate 
that exports of crude materials had decreased in value each succeeding 
year, which is of course not the case. 

The actual indexes, which appear in Part C, and the per cents of 
increase or decrease in Part D, present an entirely different situation. 
The total value of United States exports has increased every year since 
1934, the total in 1937 being 57 per cent greater than in 1934. A net 
increase for the four-year period has been reflected in every class of 
exports. The greatest and most consistent increases in proportion to 



262 



BUSINESS STATISTICS 



y 

a I 

3U 
w 

: 

g 



8 
& 



o\ <s. ^ >o q 
^ en v> O Ox 
CM CS^f 



i-i oo O *>o 
^ c>i 06 >o 



n *\oo oo r- 
M o r- r^ '- 
r^ rH t-\o vo 



00 "<f A^ 
trv^T ON " 











8 






ON 


,-1 OO \O 00 -5f 

r- r- ON 00 


VN 


B 
& 

3 

w 
o 


VO 


M*"* 


2 


L) 








ERCENTAGI 


fO 


1 










. 













tv. 

fO 
ON 


SiSSSS 













M 

Tf 

ON 


fO 


<N 00 VO ^N *-< 

O ON 00 -< *0 


2 


INDEX ON 


ON 


O O ONO ^ 

** rH i-l i-H 


S 




fO 
ON 


8 


S 



















































ECONOMIC CLASSIC 




M . . 


1 



RATIOS 263 

1934 appeared in manufactures, semi-manufactures showing slightly 
greater proportionate gain than finished manufactures. Manufactured 
foodstuffs declined in 1935 and 1936, but in 1937 exceeded the base 
year value by 6 per cent. Crude foodstuffs showed no change in 1935, 
a slight slump in 1936, but in 1937 jumped to 178 per cent of the 
1934 value. Crude materials rose to 105 in 1935, dropped to 102 the 
following year, and in 1937 amounted to 111 per cent of the base 
year value. 

The interpretation of any set of parallel time series is full of hazards. 
Chief among these is a tendency to make cross-comparisons between 
the corresponding per cents in the various series as if they were abso- 
lute values instead of indicating that every change is in proportion to 
the base of its own series. Failure to read each change in relation to 
the base year may give the mistaken impression that the percentage 
of change refers to the year immediately preceding. 

These two examples of comparisons between series of ratios have 
brought out two entirely different sets of relationships neither of 
which was very obvious from the original data. In neither case was it 
necessary to make any further computations in order to express the 
comparisons. When they are thus stated in general terms, or are im- 
plicit, these relationships are usually clear and can be readily under- 
stood because the corresponding ratios in the several series are all 
expressed as relatives of 100 per cent. 

Ratios of unlike items: There is no difference from the foregoing 
method in dealing with comparisons between ratios made up of unlike 
items. They correspond exactly to total-to-total attribute ratios between 
like items and may be compared according to the same characteristics, 
time, space, or attribute. Figure 34 contains an example of each of 
the three classifications. 

Since any single ratio between unlike items must have a numerator 
and denominator that are identically defined in time and space, the 
most common comparisons between several such ratios will be with 
respects to differences in either one of these two characteristics. In 
such cases the numerators of all the ratios will be classified according 
to either time or space, their respective denominators will have the 
same classification, and this will become the classification of the ratio 
comparisons, as in Examples 1 and 2 of Figure 34. 

Ratios between unlike items are also sometimes compared with 
respect to an attribute, but since unlike items do not ordinarily possess 



264 BUSINESS STATISTICS 

FIGURE 34 

CLASSIFICATION OF COMPARISONS BETWEEN RATIOS OF UNLIKE ITEMS, WITH EXAMPLES 

OF EACH 



KIND or 

COMPARISON! 

BETWEEN RATIOS 



EXAMPLE 



Time 



Space 



Attribute 



Amount of income tax paid per capita in the United States, com- 
pared annually over a period of years. 

Ratios of turnover per store for a retail clothing chain during a 
given year, compared for 10 different cities. 

Ratios of new orders received by a manufacturing concern to its 
shipments of finished goods during a given month, compared by 

type of goods. 



a common set of attributes this is a situation less frequently found. In 
accounting practice, however, the unlike items may often be over- 
lapping parts of two different attribute classifications of the same 
thing. These attributes have a relation to one another although they 
are not mutually exclusive. However, since both terms of such ratios 
are expressed in the same unit they are subject to a cross-classification 
according to some other attribute of the common unit. Number 3 is 
such an example: new orders and shipments are items that overlap, 
but both may be classified according to type of goods, and the ratios 
between them may be compared according to that attribute. 

In other instances series of ratios between unlike items may occur 
in which all have the same denominator, the numerators being differ- 
entiated according to an attribute. Examples are: death rates in the 
United States for a given year according to specific causes of death; 
production of wheat per acre under identical conditions according to 
grade of seed sown; and the per capita consumption of beef, mutton, 
and pork in the United States. A number of such series, each on a 
single base, may be compared with respect to differences in any other 
characteristic common to both terms of the ratios. The deaths from 
various causes might be separated according to sex and the two sets 
of rates compared; the production of wheat from the different grades 
sown might be tested in several states to compare the effects of varying 
climatic conditions; and changing habits in per capita consumption of 
the three kinds of meat might be compared over a long period of 
years. The interpretation of comparisons between such series of ratios 
would be similar to the interpretation of series of ratios between like 
Items that was illustrated in Table 39- 



RATIOS 265 

It will be recalled that ratios between unlike items are not expressed 
as per cents except in certain instances when both items have the same 
unit. The column heading of a set of ratios names the values below 
it as a certain number of the numerator unit per 1, 1,000, 100,000, 
etc., of a given denominator, as "per capita consumption, in pounds," 
"average individual income tax, in dollars/' or "number of automobile 
deaths per 10,000,000 gallons of gasoline consumed/' Consequently, 
the ratios in the table appear in the same unit as some of the original 
data, and it is more difficult to remember that they are relatives than 
when they appear in the form of per cents. If too many different com- 
binations of these ratio relationships between unlike items are given 
in a single table, the resulting confusion may be as great as when 
too many kinds of per cents are used together. There is also the 
same necessity for guarding against changes in definition of numerators 
or denominators during the course of the comparisons. The danger of 
comparing heterogeneous totals must likewise be avoided, since unlike 
items that are selected as having a relation to each other may not have 
been given limitations sufficiently specific. Some of these more complex 
problems involved in comparisons between ratios will be dealt with in 
the next chapter. 

Averaging Ratios 

Averaging ratios is one method of comparing them. However, the 
principles involved are sufficiently different from those discussed in 
preceding pages to warrant a separate presentation. There are two 
rules that must be observed: (l) ratios cannot be averaged unless 
they are comparable in every respect; and (2) whenever ratios are 
averaged they must be weighted according to their relative importance. 4 

Comparability. This principle carries over directly from the earlier 
discussion of comparability of the terms of individual ratios and 
comparability between ratios. The data must be homogeneous for the 
purpose at hand, and the numerators and denominators must retain 
the same definitions throughout. 

In Table 40, column 3, the yield per acre of rye in the United 
States is given for three successive years, with the average yield for 
the three years combined. These data might well be considered as too 
general for some purposes, but for the purpose of comparing the 

4 This is intended to apply primarily to the use of the arithmetic average. A geometric 
average is often computed without weights. 



266 



BUSINESS STATISTICS 



United States with other countries they are specific enough. The ar- 
rangement of the original data in columns 1 and 2 indicates that the 
definitions have remained constant: no change has occurred in the units 
of measure, acres, and bushels; the year in each case is the calendar 
year and not the crop year; and the total acreage and the total produc- 
tion have been secured from all the rye-producing states by methods 
of reporting that are essentially the same from year to year. Therefore 
the average yield for the three years, 11.4 bushels per acre, is valid 
from the point of view of comparability of the ratios from which it 
was computed. 

Weighting. The relative importance of ratios is determined by 
the values of their respective denominators. If ratios are weighted by 
their denominator values and then averaged, the result is the same as 



TABLE 40 
PRODUCTION, ACREAGE, AND YIELD PER ACRE OF RYE IN THE UNITED STATES, 1933-35* 



YEA* 


(1) 

PRODUCTION 
(1,000 bu.) 


(2) 
ACRES HARVESTED 
(1,000 acres) 


(3) 
YIELD PEK ACRE 
(bu.) 


1933 


21 150 


2 349 


9.0 


1934 


16045 


1 942 


8.3 


1935 


57,936 


4,063 


14.3 










Total 


95,131 


8,354 


11.4 











* Agricultural Statistics, 1936, p. 25, United States Department of Agriculture. 

a ratio between the sums of the original data of numerators and de 

nominators. The latter method was used in Table 40: * 11.4. 

8,354 

Whenever all the original data are available this is a much simpler 
procedure than weighting and averaging the individual ratios, as 
follows: 

9.0 X 2,349 = 21,150 

8.3 X 1,942 = 16,045 

14.3 X 4,063 = 57,936 



95,131 
8,354 



8,354 95,131 
= 11.4 = weighted average 



Approximate relatives in round numbers may be used as weights 
instead of the actual denominator values, causing practically no dif- 
ference in the average. The acreages for the three years are in the 



RATIOS 267 

approximate proportion of 7, 6, and 12. Using these weights and 
dividing by the sum of the weights gives the following result: 

9.0 X 7= 63.0 

8.3 X 6 = 49.8 

14.3 X 12 = 171.6 

25 284.4 

284.4 



25 



zn 11.4 = weighted average 



It is therefore possible to average ratios for which no accompanying 
data are available provided that there is available instead a set of 
weights proportionate to the values of the actual denominators. If no 
information is at hand concerning the relative importance of the de- 
nominators, the ratios cannot be averaged. 

The only case in which a simple average of ratios will give a correct 
result is when all the denominators are of equal importance, that is, 
when they are identical in value. However, this constitutes no excep- 
tion to the rule that weighting is always necessary. Such an average 
is not unweighted, but all the ratios have been given equal weights. 

Using some other set of weights in place of the denominators of 
the ratios will not give a valid average. 5 Some other factors may 
appear to be of importance but the average will be distorted unless 
such factors are combined with the separate denominators before the 
original ratios are computed. 

Table 41 shows in Part A an incorrect weighted average, and in 
Part B the same data averaged by the correct use of weights. The 
weights used in Part A are the numbers of workers at each wage 
rate. Obviously these are of importance in determining average wage 
increases, but if so they must be incorporated in the denominators of 
the original ratios. Instead of being merely wage rate, the numerator 
and denominator of each ratio should be payroll, that is, rate X n um- 
ber of employees, as in Part B. 

Since the number of workers at each rate is assumed to be the same 
in 1936 as in 1926, the per cents of change in the payroll (Part B, 
column 6) are the same as the per cents of change in wage rates (Part 
A, column 3). The weights, however, are in different proportion since 
in Part B the numbers of workers have been multiplied by varying 
wage rates as of 1926. Reduced to a common base of 100 the weights 
in Part A would be 40, 32, and 28, but in Part B they would be 13, 

5 An exception to this rule is explained in connection with Table 74, page 394. 



268 



BUSINESS STATISTICS 









00 






OT 




OS 






o 


^^ 








sg 


X 


II 






H 




O O O O cO 
O O 00 00 SO 






O 


N^ 


ITS (N *H 00 






w 








1 

s 


(4) 
WEIGHTS 


Number of 
Employees 


m O 00 cO 

CN 04 r-t \O 




w 










S O 

M b^ 

*2 








O 


> t H 

P^ W 

-I 








H 
W 
O 

t i 


I* > 




o 






f W 




*S 2 




^ 


TABLE 4 
j OF PER C 
NCORRECI 


_g 


Percentage 
Increase* 
) ^ (1) 


O 00 


; CORRECT 


o * 




CM 






$4 











Z 










3 


o 












P 


M 










<< 












H 


5 




VO 












CO 






o. 












3 


^"X 




- 


9> 




8 


^ 


8 












u 












bo 










> 

NATOB 













^0 







*% 



















O 











VO 




VO 








CO 




2 




4- 




H 











X 


00 o o 
O O O I s * 
C^O^fN^ C^ OS 







g 


so r^- 




H 










* o 









in 


o% 




VO 


P 


|i ? 


O 
fN VO t-t 


vd 
co 




fS /-s 




II 




w 




O 


M 








1 


| 




CO 


SO CN <N 
O OS OS 


vq 


V) Jg 




CO y 


co vq^r** vq^ 


vd 


^^ 3 




_i 


r-( fN| 


rH 


P 
fc 


JS 


CM 


* -^ 


II 




"s 






CN O 










os r^ 




tS 






VO OS 


M 


"o 






ss 


Sa 

'P 

^o v 


H 


CO 
CM NX 


o o o o 

ITv O fN ^ v 
(N O !*. Cs 


d 

-S 


S^ 




N -' 




' 










3 








i2 




O o> 






s 


ii 

g 0, 


in O 00 
CN <N TH 








VO 


(N Q^f 




s 


S 


CO 

0\ 


S 00 ^ 






S 










bo 










& 








^ 





CM 


O O O 






V 


*"* 







I: 

81 

18 

a 

TD 

S S 

V * 



:& 



111 

" 



:% 





IS 

" 



I 



P.-I 

i 



RATIOS 269 

51, and 36. Consequently the average of the weighted ratios in Part B 
differs from that in Part A. The average in Part B is the correct one 
because it coincides with the ratio of the total original data, (total 
column 5 -f- total column 4, 100%) . Similar proof cannot be applied 
to the method used in Part A. The total of column 2 divided by the 
total of column 1 minus 100 per cent equals + 36 per cent whereas the 
weighted average is -f-29.8 per cent. 

It might be argued that in an actual case the number of workers at 
each wage rate would not remain the same over a period of 10 years. 
In such cases the payrolls can be determined for each year and, if the 
ratios between them are weighted by the denominators as in Part B, 
the average will again agree with the ratio between the total original 
data. The usefulness of this average representing total payroll increase 
may be open to question. There might be no change at all in wage 
rates, or even in total number of employees, but a shift in the distribu- 
tion of personnel at the various rates would cause an increase or de- 
crease in the total payroll. 6 An analysis of each rate separately would 
probably be of greater value. However, the average of changes in 
total payrolls as shown in Table 41 is of practical value in calculating 
the effects of planned payroll changes for a given number of employees. 

PROBLEMS 

1. For each of the following pairs of items, compute the ratio and (1) state 
the relation in words; (2) give reasons for selecting the item you used 
as the base; (3) justify in terms of the text the number of units used 
in the base. 

a) Total tonnage of steel produced in 1938 18,692,937 gross tons 

Total tonnage of steel consumed by the automotive in- 
dustry in 1938 3,155,906 gross tons 

b) Number of commercial banks in the United States re- 
porting retail installment paper in their portfolios as of 

Dec. 30, 1939 10,382 

Amount of retail installment paper held $541,367,000 

c) Bales of cotton produced in the United States in 1938. . . 12,008,000 
Bales of cotton produced in Brazil in 1938 1,877,000 

d) The average weekly wages of steel workers in the United 
States was $35.90 in 1929 and $29.40 in 1939- 

e) Population in United States registration area, 1930 118,560,800 

No. of deaths from diphtheria, 1930 5,822 

2. From any issue of the World Almanac select ratios illustrating each of the 
three sizes of base explained in the text. What is the relation between 
the numerator and the denominator of each ratio? 



6 This question is developed in greater detail under standardized ratios in the next 
chapter, and weight bias is discussed in connection with index numbers in chapter XIX. 



270 BUSINESS STATISTICS 

3. a) In 1938 the sales per salesperson in 8 department stores located in 
cities with population less than 20,000 was $8,500; for 30 stores located 
in cities with population over 1,000,000 the corresponding figure was 
$18,000. 

b) In 1935 the per capita sales of drugstores in Cleveland were 110 per 
cent of the sales in Detroit. 

c) In 1938 the average amount of each dollar of revenue set aside by 
Class I railroads to pay taxes was 9 cents. 

To what extent do these examples conform to the rules stated in the text 
for ratios between unlike items? 

4. Given the following ratios: 

a) Dollars paid by industrial consumers of electricity to the number of 
kilowatt hours consumed during a single year in a given industrial 
area. 

b) Bank debits in New York City to bank debits in the rest of the United 
States during a given month. 

c) Average number of active spindles in cotton textile manufacturing in 
the New England area this year compared with the corresponding 
figure for last year. 

d) Time deposits of commercial banks to demand deposits of those banks 
as of a certain call date. 

e) Sales of Chevrolet passenger automobiles to sales of Ford passenger 
automobiles last year in the state of California. 

/) Imports of wheat into the United States from Canada to production of 

wheat in the United States for last year. 
g) Number of deaths caused by industrial accidents in New York State in 

January, 1940, to the number in January, 1941. 

Classify these ratios in three ways: (1) like items or unlike items, (2) 
time, space, or attribute, (3) part-to-total, part-to part, or total-to-total. 

5. From all the families with incomes of $1,000 and over in the following 
table, what per cent have no automobiles? Show method of computation. 

SELECTED FAMILIES IN PORTLAND, OREGON, WITH INCOMES OF 
$1,000 AND OVER, 1933 



INCOME GROUT 


NUMBER OF 
FAMILIES 


PERCENTAGE NOT HAVING 
AN AUTOMOBILE 


$1 000-1 499 


1 426 


34.2 


1 500-1 999 


1 068 


25.0 


2,000-2,999 


701 


16.7 


3,000-4,999 


300 


13.0 


5,000-6,999 


45 


2.2 


7,000 aikl over 


27 


11.1 



6. Given the following information concerning deposits and depositors in 
mutual savings banks and postal savings in 1932 (000 omitted for all 
figures) : 



RATIOS 



271 



STATES 


MUTUAL SAVINGS BANKS 


POSTAL SAVINGS 


Deposits 


Depositors 


Deposits 


Depositors 


United States 


$10,040,000 
5,287,000 
63,000 


12,700 
5,900 
100 


$783,000 
82,000 
20,000 


1,540 
200 
40 


New York 


Minnesota 





a) Compute whatever ratios you consider necessary to compare methods of 
saving from the data given. 

b) Write a statement of your findings. 

7. Each student will be given one assignment for each part of this problem. 
Answer either (a), (b), (c), or (d), throughout. 



CIGARETTE CONSUMPTION IN THE UNITED STATES 
(Millions) 

July, 1933 9,526 July, 1936 14,801 

July, 1934 11,355 July, 1937 15,290 

July, 1935 13,138 

Compared with the same month of the previous year, compute the per- 
centage relation and percentage change. 



(a) 1934 with 1933 

(b) 1935 with 1934 



(c) 1936 with 1935 

(d) 1937 with 1936 



B. The following data show the amount of retail trade (in millions 
of dollars) in Buffalo in 1935 for 8 lines of trade (column 9 being 
"all others") and the total: 



(D (2) 

54.9 37.8 



(3) 
22.9 



(4) 

26.0 



(5) (6) 

86 6.8 



(7) 

17.3 



(8) (9) 

6.3 24.8 



Total 
205.4 



Express the relation between the 2 items assigned to you, and the per 
cent which each is of the total retail sales. 

(a) Column 1 food, and column 7 eating and drinking places 

(b) Column 2 general merchandise, and column 8 drugstores 

(c) Column 3 apparel, and column 4 automotive 

(d) Column 5 furniture and household goods, and column 6 lumber, 

building, and hardware 

C. Number of automobile fatalities and millions of gallons of gasoline 
consumed by motor vehicles in four states, 1934: 







N. Y. 


TEXAS 


IOWA 


N. H. 


No of fatalities.. 




2,903 


1,579 


531 


104 


Millions of gallons 


of gasoline. . . 


1,501 


875 


404 


71 



272 



BUSINESS STATISTICS 



Express the relation between gasoline consumption and deaths from auto- 
mobile accidents in: (a) New York, (b) Texas, (r) Iowa, (d) New 
Hampshire. 

8. Are there any of the ratios in the following that you cannot interpret? 
If so, explain what additional information is needed in order to draw 
valid conclusions from the ratios. 

PRODUCTION OF MAPLE SUGAR AND SYRUP IN 
THREE LEADING STATES, 1937 AND 1938 





AVERAGE TOTAL PRODUCT PER TREE 


PERCENTAGE 
OF UNITED 
STATES 
PRODUCTION 


As Sucrar 
(pounds) 


As Syrup 
(gallons) 


1937 


1938 


1937 


1938 


Vermont 


1.5 
1.8 

2.7 
1.8 
.290 

101.0 


2.3 
1.7 
1.9 
2.0 
.283 

98.6 


.19 

.22 
.34 
.23 
1.60 

77.7 


.29 
.21 
.24 
.25 
1.61 

78.2 


37.9 
25.7 
15.3 
100.0 


New York 


Ohio 


United States 


Average price 


Percentage of 1925-29 
average 



9. The following is quoted from the report of a tobacco manufacturer to the 
stockholders: "Government figures, with our own figures, prove that our 
Company obtained during the first ten months of 1940 .... 59.93% 

of all the cigarette increase of the entire industry, ff 

What additional data would be needed in order to determine the importance 
of this report? 

10. Locate each of the three sets of ratios of problem 7 according to Figures 
33 and 34, and state which of the simple ratios are part-to-part, part-to- 
total, or total-to-total relations. 

11. Describe two separate methods of computing the average per cent living 
on farms in Table 32, page 241. State exactly what data would be needed 
for each computation. 

12. a) Compute the percentage change in average value per contract of non- 
residential building contracts, from the following data: 



1937 



1938 



NUMBER 


COST 


NUMBER 


COST 


124,305 


$564,961,000 


116,993 


$567,069,000 



b) With the following additional information discuss how much sig 

nificance can b* attached to your original ratio: 



RATIOS 



273 



TYPE OF NON-RESIDENTIAL 




1937 




1938 


CONSTRUCTION 


NUMBER 


COST 


NUMBER 


COST 


Private garages and sheds 


96,514 


$27,423,000 


91,147 


$23,798,000 


Other private construction 


26,711 


413,072,000 


24,497 


402,762,000 


Public works, public buildings 
and utilities 


1,080 


124,466,000 


1,349 


140,509,000 



13. An industrial plant had the following number of employees at 2 different 
periods, with their respective total weekly payrolls. 



TYPE OF 


15 


25 


IS 


>28 


EMPLOYEES 


Aver. Number 
Employed 


Aver. Total 
Weekly Payroll 


Aver. Number 
Employed 


Aver. Total 
Weekly Payroll 


Administrative 
and clerical . 
Skilled 


194 
320 


$ 9,409.00 
14 630.40 


156 
235 


$ 7,566.00 
10,744.20 


Unskilled .... 


608 


13,254.40 


731 


15,935.80 


Total 


1.122 


37.293.80 


1.122 


34.246.00 



A union of unskilled workers used the foregoing totals to prove that there 
had been a decrease in wages of 8.2 per cent. Discuss. 

REFERENCES 

BAILEY, WILLIAM B., and CUMMINGS, JOHN, Statistics. Chicago: A. C Mc- 
Clurg & Co., 1917. 

Chapter VI lays down the pattern from which all subsequent work on 
ratios has been built. 

BURGESS, ROBERT W., Introduction to the Mathematics of Statistics. Boston: 
Houghton Mifflin Co., 1927, chapter II. 

JEROME, HARRY, Statistical Method. New York: Harper & Bros., 1924 
chapter VIII. 

CROXTON, FREDERICK E., and COWDEN, DUDLEY J., Applied General Statistics. 
New York: Prentice-Hall, Inc., 1939, chapter VII. 

CROXTON, FREDERICK E., and COWDEN, DUDLEY J., Practical Business Statistics. 
New York: Prentice-Hall, Inc., 1934, chapter VII. 

RIEGEL, ROBERT, Elements of Business Statistics. New York: D. Appleton- 
Century Co., 1927, chapter IX. 



CHAPTER XII 
APPLICATIONS OF RATIOS 

THE basic principles and uses of ratios were discussed in the 
preceding chapter. Some more complicated cases of ratio 
analysis omitted there are presented in this chapter. The sub- 
jects included fall into three groups: (l) refined ratios and their ap- 
plication in standardized form; (2) compound ratios and the conditions 
for their interpretation; (3) some types of ratios used in particular 
fields of business. 

REFINED RATIOS 

General 

In a refined ratio special care is exercised to define both numerator 
and denominator so as to exclude whatever extraneous factors tend 
to obscure the direct relationship between the two items. The ad- 
vantage of the refined ratio lies in the opportunity of selecting in the 
denominator the item or items that are directly related to the numera- 
tor. Thus in vital statistics the ratio of measles cases to population 
under 16 years of age conveys much more information than the ratio 
of measles cases to total population. In the latter case a decline in 
the ratio over a period of years might be the result of an increase in 
the number over 65 years of age in the population, whereas the in- 
cidence of measles among those likely to contract the disease may have 
been unchanged. 

The ratio of labor cost in a factory to total cost of manufacture is a 
useful figure, but the denominator contains two kinds of cost, fixed cost 
and variable cost. The ratio of labor cost (a variable cost) to total 
variable cost gives a figure which is more valuable to management in 
analyzing operations. In the same way safety departments of manufac- 
turing plants get an over-all accident rate for the entire plant by taking 
the ratio of employees injured to number employed. This figure is of 
value only as a summary. The danger of accidents varies from one 
department to another both in frequency and severity. In a steel plant 
the tipping of a ladle in the furnace room is of infrequent occurrence 
but usually results fatally to workmen in the path of the hot metal. On 

274 



APPLICATIONS OF RATIOS 275 

the other hand men engaged in piling steel rods for storage may be 
involved in injuries rather frequently but the injuries are seldom fatal. 
Similar contrasts can be made between departments in any type of 
manufacturing operation. To take account of these variations safety 
men compute the ratio, accidents to employees, for each department 
separately. 

Both the numerator and denominator of each departmental ratio are 
refined further in order to facilitate the study of accidents. The result- 
ing ratio, known as the accident severity rate, is the number of days' 
work lost through accidents 1 divided by the number of equivalent 
full-time days worked. 2 The rate may be expressed in any unit of time, 
per week, per month, or per year. 

The study of deaths in automobile accidents furnishes another ex- 
ample of the use of refined ratios. Columns 1 and 2 of Table 42 con- 
tain the number of persons killed in automobile accidents and the 
population of the United States yearly from 1930 to 1938. The number 
of persons killed in automobile accidents per million population is 
given in column 5. The steady rise of this ratio from 1930 to 1937, 
interrupted only in 1932 and 1933, is the basic fact of the so-called 
"automobile menace." The marked decline of the ratio in 1938 appears 
to be the first evidence of abatement of the "menace." 

That this conclusion is premature can be seen from further study 
of available information. The ratio of fatalities to population takes 
no account of the changed hazard resulting from increase in the 
number of automobiles on the highways. This factor is included in 
the ratio of fatalities to automobiles registered, shown in column 6. 
This ratio fluctuates irregularly from 1930 through 1933, rises to a 
high point in 1934, and except in 1937 has declined since that time. 
The 1938 ratio of 109 fatalities per hundred thousand automobiles 
registered is the lowest recorded during the nine-year period. This 
ratio indicates that the increased death rate after 1934 was not propor- 
tionate to the increased hazard represented by the number of automo- 

1 The number of days' work lost can be counted for temporary accidents but not for 
death, permanent disability, or permanent impairment. Even temporary disabilities such 
as the loss of a finger will lead to different numbers of days' work lost. Consequently stand- 
ards have been established for each type of accident. Thus, according to one standard 
6,000 days are allowed for death, 4,000 days for loss of an arm, 1,200 days for loss of a 
thumb and one finger, etc. Through the use of these standards, accident seventy can be 
measured as between departments of a plant, between plants or between industries, as well 
as for different time periods. United States Bureau of Labor Statistics Bulletin No. 234, 
The Safety Movement m the Iron and Steel Industry, p. 278. 

2 The number of equivalent full-time days worked is obtained by dividing the total 
number of man-hours worked during a given period by the standard working hours per day 



276 



BUSINESS STATISTICS 



biles in use. In fact, the decline in fatalities per car since 1934 may 
be evidence of increased caution on the part of drivers. 

But there is another factor to be considered. Cars are being driven 
more miles in recent years; hence the exposure to accident is increased. 
The ratio, number of deaths per 100,000,000 gallons of gasoline con- 
sumed each year, as shown in column 7, allows for the increased 
mileage of cars. If the average number of miles obtained per gallon of 
gasoline had remained fixed, then the number of gallons of gasoline 
consumed would have a constant relation to the number of miles cars 
were driven. The average number of miles per gallon has probably 
changed slightly during this nine-year period, but accurate information 
on this point is not available. Taking column 4 as a legitimate substi- 
tute for miles driven, we find that the ratio of deaths per 100,000,000 
gallons of gasoline consumed has declined sharply from the high point 
reached in 1934. The ratios in columns 6 and 7 show that the increase 
in the automobile death rate through 1937 is due to the increased ex- 
posure of the population to the hazard of automobile accidents in spite 
of a decline since 1934 in deaths per motor vehicle registered and per 
gallon of gasoline consumed. 

This conclusion does not in any way affect the desirability of a 
reduction in automobile fatalities. It does seem to indicate that, on the 
average, drivers maintain better control of cars in recent years and that 

TABLE 42 

FATALITIES IN AUTOMOBILE ACCIDENTS RELATED TO POPULATION, 
MOTOR VEHICLES REGISTERED, AND GASOLINE CONSUMED, 1930-38 





CD* 


(2)t 


(3)t 


(4)t 


(5) 


(6) 


T, (7> 










T^r\ f\i> 




FATALITIES 


FATALITIES 


YEAR 


PERSONS 
KILLED IN 
AUTO- 
MOBILE 
ACCIDENTS 


POPULA- 
TION 
(000 
omitted) 


No. OF 
MOTOR 
VEHICLES 
LICENSED 
(000 
omitted) 


INO. OF 

GALLONS 

OF 

GASOLINE 
CONSUMED 
(000,000 
omitted) 


FATALITIES 

PER 

MILLION 
POPULA- 
TION 

(1) -^ (2) 


PER 

HUNDRED 
THOUSAND 
MOTOR 
VEHICLES 
REGISTERED 
(1) *- (3) 


PER 100 

MILLION 
GALLONS 

OF 

GASOLINE 
CONSUMED 
(0^(4) 


1930. 


32,540 


123,091 


26,545 


14,751 


264 


123 


221 


1931. 


33,346 


124,113 


25,814 


15,408 


269 


129 


216 


1932. 


29,196 


124,974 


24,115 


14,250 


234 


121 


205 


1933. 


31,078 


125,770 


23,874 


14,224 


247 


130 


218 


1934. 


35,769 


126,626 


24,952 


15,292 


282 


143 


234 


1935. 


36,023 


127,521 


26,231 


16,264 


282 


137 


221 


1936. 


37,500 


128,429 


28,166 


17,855 


292 


133 


210 


1937. 


40,300 


129,257 


29,705 


19,218 


312 


136 


210 


1938. 


32,000 


130,215 


29,486 


19,610 


246 


109 


163 



Figures compiled and published by the Travelers Insurance Company, Hartford, Connecticut. 
t Statistical Abstract, 1938, p. 10. 

t Mimeographed releases of the United States Department of Agriculture, Bureau of Public 
Roads. Gasoline consumption is by motor vehicles only. 



APPLICATIONS OF RATIOS 277 

further attempts to reduce the automobile accident toll will depend 
upon the compilation and study of more detailed information concern- 
ing the accidents that occur. This most likely means ratios subdivided 
for day and night accidents, accidents occurring on city streets and on 
open highways, accidents related to age of driver and age of motor 
vehicle, etc. 

Standardized Ratios 

Standardization of ratios consists in separating an over-all ratio 
into several mutually exclusive parts and computing a new com- 
bined ratio in the form of a weighted average of the several part 
ratios. The weights selected are a distribution of the denominators of 
the several part ratios according to some accepted standard, and these 
weights are held constant throughout any series of ratios that are being 
compared. 

The use of standardized ratios originates in the field of vital statis- 
tics where standardized and corrected death rates, birth rates, etc., are 
employed in comparisons between different cities or sections of the 
country. The crude death rates of two cities may differ because of a 
difference in the age composition of the two populations although the 
death rate for each age group is identical in the two. The effect of 
variation in the age composition of different populations can be ad- 
justed by the use of either standardized or corrected death rates. 8 

These methods can be transferred advantageously to the field of 
business ratios. In most cases the concept of the corrected rate rather 
than the standardized rate is applicable to business situations. What 
are known as "standardized ratios" in business applications are the 
equivalent of "corrected rates" in vital statistics. The business usage 
is defensible because the expression "corrected rates" is likely to con- 
vey the impression that the unstandardized rates contain errors. Such is, 
of course, not the case, but the standardization leads to more precision 
in the interpretation of results. 

Department Store Example. The experience of a department store 
will serve to illustrate the standardization of ratios. The officials were 
using the amount of the average sales check in four selected depart- 
ments combined as a quick evidence of changes in business conditions. 

8 Standardized and corrected rates are computed differently and lead to ^ different 
results. Particular circumstances will determine which should be used in a given case. 
The details of both computation and interpretation are well presented in Raymond Pearl, 
Mtdical Biometry and Statistics (Philadelphia: W. B. Saunders Co., 1923), pp. 19S-207. 



278 BUSINESS STATISTICS 

The total dollar value of sales and the number of sales were subject 
to wide seasonal swings but the ratio of the two the amount of the 
average sales check exhibited little seasonal influence. In the past this 
ratio had proved to be a very sensitive indicator of approaching busi- 
ness depression or recovery. Table 43, column 3, shows the ratio for 
August and September, 1936. 

TABLE 43 

SALES, NUMBER OF SALES CHECKS, AND AMOUNT OF AVERAGE SALES CHECK 

FOR A DEPARTMENT STORE IN AUGUST AND SEPTEMBER, 1936, 

FOUR SELECTED DEPARTMENTS COMBINED 



MONTH 


(1) 

SALES 


NUMBER OF 
SALES CHECKS 


A (3) 

AMOUNT OF 

AVERAGE SALE 


August 


$48,102 


8 013 


$600 


September 


45 530 


7660 


594 











The declining tendency of sales, number of sales, and the amount of 
the average sales check caused considerable consternation because 
everyone expected September results to be above the August level. 
When the figures for the entire store became available a sizable expan- 
sion of sales was shown as well as an increase in the amount of the 
average sales check. The question then arose as to what had happened 
to the hitherto reliable preliminary indicator. 

A study of the results by departments is given in Table 44. In De- 
partments I and IV the percentage of decline in the number of sales 
checks exceeded the percentage of decline in sales, so that the amount 
of the average sales check increased. In Departments II and III the 
percentage of increase in number of sales checks was less than the 
percentage of increase in sales, which again resulted in an increase in 
the amount of the average check. It seemed strange then that the 
average check should have decreased for the four departments com- 
bined. The explanation lies in the fact that Departments II and III 
with small average sales checks had increased sales while Depart- 
ments I and IV with somewhat larger sales checks showed great de- 
clines in sales. This combination of changes shifted the weights of 
the four departments so much that the combined ratio declined. 

The change in the amount of the average sales check in Table 44 
is dependent upon shifts in sales among the departments and changes 
in the average amount purchased per customer. Since the intention was 
to measure only the latter change, a means of eliminating the effect of 



APPLICATIONS OF RATIOS 



279 



the former had to be devised. This was done by setting up a standard 
distribution of sales checks among the four departments and computing 
the amount of the average sales check each month for this standard 
distribution. 

TABLE 44 

SALES, NUMBER OF SALES CHECKS, AND AMOUNT OF THE AVERAGE SALES CHECK 

FOR A DEPARTMENT STORE IN AUGUST AND SEPTEMBER, 1936, 

FOUR SELECTED DEPARTMENTS COMPUTED SEPARATELY 







AUGUST 






SEPTEMBFI 


i 


Department 


(1) 

Monthly 
Sales 


(2) 

No. of 
Sales 
Checks 


. (3) r 
Amount of 

Average 
Sales Check 
(l)-7-(2) 


(4) 

Monthly 
Sales 


(5) 

No. of 
Sales 
Checks 


(6) 
Amount of 
Average 
Sales Check 
(4) + (5) 


I 


$10416 


1 010 


$1031 


$ 4293 


380 


$11.30 


II 


9622 


1,595 


603 


11 889 


1,862 


6.39 


Ill 


21 840 


4621 


473 


26400 


5,112 


3.16 


IV 


6224 


787 


7.91 


2,948 


306 


9.63 
















Total or 
average 


448.102 


8.013 


S 6.00 


545.530 


7,660 


$ 5.94 



TABLE 45 

COMPUTATION OF THE AMOUNT OF THE AVERAGE SALES CHECK (STANDARDIZED) 
FOR A DEPARTMENT STORE IN AUGUST AND SEPTEMBER, 1936 



Department 


STANDARD FIGURES 


AUGUST 


SEPTKMBEB 


(1) 

Standard 
Distribution 
of 
Sales Checks 


(2) 

Standard 
Distribu- 
tion of 
Sales Checks 
With Total 
of One 
(1) -f- 8,182 


(3) 

Amount 
of 
Average 
Sales 
Check 


(4) 

August 
Standard- 
ized Sales 

(3) X (2) 


(5) 

Amount 
of 
Average 
Sales 
Check 


(6) 

September 
Standard- 
ized Sales 

(5) X (2) 


I 


845 
1,729 
4,615 
993 


.10328 
.21132 
.56404 
.12136 


$10.31 
6.03 
4.73 
7.91 


1.065 
1.274 
2.668 
.960 


$11.30 
6.39 
5.16 
9.63 


1.167 
1.350 
2.910 
1.169 


II 


HI 


IV 


Total or 
average 


8,182 


1.00000 




5.97 




6.60 



The standard distribution of sales checks was obtained by taking the 
average monthly number of checks in each department for the year 
1935. The selection of this standard was more or less arbitrary, but it 
approximated the actual distribution of sales checks among the four 
departments. The computation of the standardized average sales check 
is shown in Table 45. The standardized distribution of sales checks is 
given in column 1. The reduction of these figures so that their sum 
is unity is shown in column 2. The computation of the amount of 



280 



BUSINESS STATISTICS 



the standardized average sales check is shown in columns 3 and 4 for 
August and in columns 5 and 6 for September. The multiplications are 
indicated at the head of the columns. The average sales check increased 
from $5.97 in August to $6.60 in September after the effect of shifts 
in sales between departments had been adjusted. 

There is some likelihood that the standard distribution of sales 
checks will lose its representativeness as time goes on. To avoid this 
contingency the distribution of sales checks for the most recent calendar 
year could be used as the standard. The results for long periods of time 
will not be comparable if a changing standard is used, but the purpose 
is merely to get a preliminary judgment from month to month; hence 
the long run situation is unimportant. 

Labor Turnover Example. Another point in our economic system 
at which standardized rates should be employed is in the measure of 
labor turnover. A measure commonly employed is known as the separa- 
tion rate. It is the ratio of the number leaving employment during a 
period to the number on the payroll during the period. For example, 
the separation rate of a manufacturing plant would be measured as 
shown in Table 46 

TABLE 46 

COMPUTATION OF MONTHLY LABOR TURNOVER (CRUDE SEPARATION RATE) 
OF A MANUFACTURING PLANT 





(l) 


(2) 


(3) 




NUMBER OF 


NUMBER OF 


CRUDE 


MONTH 


EMPLOYEES 


EMPLOYEES 


SEPARATION 




ON PAYROLL 


LEAVING 


RATE 




AT BEGINNING 


EMPLOYMENT 


(per cent) 




OF MONTH 


DURING THE MONTH 


(2) -i- (1) 


March 


4800 


536 


11.2 


November 


4660 


453 


9.7 











These figures show an appreciable decline in the separation rate 
from March to November. This is a crude rate which for some pur- 
poses may be satisfactory, but it would not be safe to conclude that 
there was greater labor stability in this plant in November. It is well 
known that out of a group of men hired at any time a certain number 
will dislike the work and leave within a few days, others will be found 
to be unsatisfactory and will be discharged after a brief trial. The re- 
mainder will continue working for longer periods and those who stay 
as long as one or two months are likely to remain with the company 



APPLICATIONS OF RATIOS 281 

for years. That is, the "employment mortality" decreases with increase 
in length of service. It would therefore be desirable to study the 
separation rate with the length of employment in the plant standard- 
ized. 

The method of doing this is shown in Table 47. The stub shows the 
length of time employees had worked for the company as of March 1. 
Column 3 shows the number leaving employment during the month 
and column 4 shows the separation rate by length of time employed. 
These specific rates show a gradual decline as the length of time em- 
ployed increases. The higher rate for those employed ten years or over 
reflects the separations due to retirement, disability, and death. The 
specific rates follow a course quite similar to that of the specific death 
rates of a population: that is, a high infant mortality rate, a gradual 
decline to past middle life, and then an increase at the upper ages. 

Exactly what should be used as a standard would have to be deter- 
mined for each plant separately. 4 The use of the average distribution 
of employees for the last calendar year seemed best suited for the plant 
in question because it had a highly fluctuating labor force, varying 
between 2,500 and 6,000 within three years. The use of a five-year 
or even a three-year standard would have placed too much emphasis 
on a past situation and there was too much variation to attempt the se- 
lection of an arbitrary standard. The standard distribution of employees 
in column 1 was obtained by taking the average of figures similar to 
those in column 2 for the twelve months of the preceding calendar year. 
These one-year averages are expressed in the form of decimals totaling 
unity following the method shown in Table 45, column 2. This form 
saves one step in the computation. If the actual average distribution of 
employees were used in column 1, the entries in column 5 would be the 
number of separations that would have occurred in the standard distri- 
bution of employees at the specific rates for the actual distribution of 
employees in March. The total of the standardized separations divided 
by the total of the standard distribution would give the same standard- 
ized separation rate that was obtained by the method in the table. How- 
ever, using column 1 gives the standardized separation rate directly as the 
total (.1021) of column 5. The same computations give the standard- 
ized separation rate for November (.1026) , as shown in columns 6 to 9. 

The summary at the bottom of the table shows that the decline in 

4 If, however, a standardized rate were to be determined for several plants or an 
entire industry, the same standard would necessarily have to be used throughout. 



282 



BUSINESS STATISTICS 



a 



3 a PU 



T> A " 

T3 U 2 i- ^ 



^ c S 

* rt 5 nj 

^> CV.O 



11-11 



VIOE 
ES 



ooooooooo 



<N<NrHr-<OO 



fNrOX}- 
OOO 



O-iOOO<NOO-ivOOO 



ooooooooo 




'13 ** - 



8P 



APPLICATIONS OF RATIOS 283 

the crude separation rate from March to November was entirely due 
to change in length of employment. The standardized rate indicates 
that there was essentially no change in the forces of labor unrest lead- 
ing to separations. 5 

This example and the one dealing with sales checks of a department 
store show the advantage of standardizing ratios in order to separate 
the significant causes of an observed change in the crude ratios. The 
method has not been used much in the past by business statisticians, but 
a wider application can be expected in the future to meet the demand 
for more exact interpretation of the ratios used to measure changes in 
business operations. 

COMPOUND RATIOS 

When a ratio between two ratios is computed the result is known 
as a compound ratio. Such ratios require careful interpretation, because 
the computation is of the form, 

numerator! numerator 2 , ^. 

i _ _ = compound ratio 

denominator! * denominator 

The result shown by the compound ratio may have arisen from changes 
in either or both numerators, either or both denominators or from 
changes in the numerator and denominator of either or both ratios. 
With so many possible combinations of causes, it is evident that mis- 
interpretation may easily occur. The conservative conclusion, therefore, 
would be to eliminate compound ratios entirely. However, they have 
become an integral part of the business man's use of statistics; hence 
it is preferable to explain their valid use rather than to eliminate a 
technique which has considerable practical value. 

Compound ratios can be divided into three groups: (1) those in 
which the denominators of the two simple ratios are stable; (2) those 
in which the simple ratio used as denominator is stable; and (3) those 
in which all of the constituents fluctuate. The latter type of ratio can 
be used only as a general indicator of changes while the first two can 
be interpreted specifically. The distinction between the three in both 
form and interpretation can be seen in the examples that follow. 

5 Current information concerning labor turnover in manufacturing plants is released 
in mimeographed form monthly by the United States Bureau of Labor Statistics. The 
reports include data for "quits," "discharges," "lay-offs," "accessions," separation rates, 
and turnover rates. This release does not make use of the refined rates presented in 
the text. 



284 



BUSINESS STATISTICS 



Stabilized Denominators 

In this type, one ratio is divided by another whose denominator is 
not identical with that of the first ratio but the difference is so slight 
that for practical purposes the compound ratio is valid. It is subject to 
the same interpretations and comparisons with other compound ratios 
in the same series that could be made if all the denominators were 
identical. For example, a study of changes in the loans and investments 
of member banks of the Federal Reserve System is presented in Table 
48. The decrease in the proportion of assets invested in government 
securities can be seen in column 3 but column 4 provides additional 
information concerning the months in which the decline was most 
pronounced. 

It might appear that the same facts could be shown equally well by 
ratios between the successive numerators, in column 2. Comparison 
of these simple ratios, listed in column 5, with the compound ratios in 
column 4 shows, however, that this is not the case. Column 5 indicates 
that the greatest decline in investments in government securities oc- 
curred from March to April, but column 4 indicates that the greatest 
decline in the ratio, investments in government securities to total loans 

TABLE 48 

TOTAL LOANS AND INVESTMENTS IN GOVERNMENT SECURITIES OF REPORTING MEMBER 

BANKS OF THE FEDERAL RESERVE SYSTEM IN 101 LEADING CITIES, 

JANUARY-JUNE, 1937 



MONTH 


(l) 

TOTAL 
LOANS 

AND 

INVESTMENTS* 
(000,000 
omitted) 


(2) 
INVESTMENTS 

IN 

GOVERNMENT 
SECURITIES* 
(000,000 
omitted) 


^ (3) 
GOVERNMENT 
SECURITIES 
AS A PER- 
CENTAGEOF TOTAL 
LOANS AND 
INVESTMENTS 
(2) -Ml) 


/4) 
COMPOUND RATIO: 
PER CENT THAT 
EACH MONTH'S 
RATIO IN 
COLUMN 3 
is OF THE PRE- 
CEDING MONTH 


(5) 

SIMPLE RATIO: 
PER CENT THAT 
EACH ITEM IN 
COLUMN 2 is OF 
THE PRECEDING 
MONTH 


January. . . . 


$22 734 


$10,493 


46.2 






Jebruary. . . . 
March 


22,600 
22 610 


10,330 
10 008 


45.7 
443 


45.7 

^ = 98 - 9 

^ - 969 


98.4 
96.9 


April . ... 


22,280 


9628 


43.2 


45.7 ~ 9 9 
43.2 

O7 ^ 


96.2 


May 


22,201 


9,483 


42.7 


44.3 ~ 97 '' 

42.7 
oa Q 


98.5 


June 


22,330 


9,515 


42.6 


43.2 ~ 98 ' 8 
426 

. __=. no Q 


100.3 










42.7 ~ "* 8 





* Federal Reserve Bulletin. September, 1937, p. 924. 



APPLICATIONS OF RATIOS 



285 



and investments, occurred from February to March. Further, column 5 
shows that investments in government securities increased from May 
to June, whereas column 4 shows that the ratio of investments in gov- 
ernment securities to total loans and investments declined slightly 
from May to June, although the rate of decline was gradually tapering 
off. It is evident, therefore, that while both columns 4 and 5 provide 
valid comparisons the information contained in column 5 is in no sense 
a substitute for that contained in column 4. 

Denominator Ratio Stable 

This case arises where one ratio or set of ratios is used as a standard 
to which another ratio or set of ratios is to be compared. Trade and 
mercantile association secretaries frequently employ this form of ratio 
comparison in studying the distribution of costs of doing business of 
individual association members. 

A wholesale grocers' association had 28 members. The members did 
not agree to divulge their actual sales or costs of operation, but each 
month they reported the per cent which each of the items listed in 
Table 49 was of the sales for the month. With the aid of an agreed- 
upon set of weights, proportionate to the sales of the individual mem- 
bers, the secretary of the association computed the average percentage 
distribution of the several items making up total sales. This distribu- 
tion served as a standard to which each of the members could compare 
his own operations. Columns 1 and 2 of the table give the average 

TABLE 49 

PERCENTAGE DISTRIBUTION OF SALES OF WHOLESALE GROCERS INTO COSTS OF 

DOING BUSINESS AND PROFIT; CONCERN NUMBER 10 COMPARED 
WITH AVERAGE FOR 28 CONCERNS* 



ITEMS 


(l) ^ (2) 
PERCENTAGE 

DISTRIBUTION OF CoiTl 


(3) 
PERCENTAGE 
VARIATION 
OF CONCERN 
No. 10 FROM 
THE AVERAGE 

(2) -*- (1) 100% 


28 
Member 
Concerns 


Member 
Concern 
No. 10 


Administrative expense 


6.1 

1.7 
8.1 
2.0 
4.5 
76.3 
1.3 


6.4 

4.2 
10.7 
2.4 
6.1 
69.3 
.9 


+ 4.9 
+147.1 
+ 32.1 
-f 20.0 
H- 35.6 
9.2 
30.8 


Rent, interest, and insurance .... 
Selling expense 


Handling expense 


Delivery expense 


Cost of goods sold 


Profit 


Total 


100.0 


100.0 







* Data taken from the files of the secretary of a wholesale grocers' association. 



286 BUSINESS STATISTICS 

distribution and the distribution for one of the members, respectively. 
Comparison of the two columns shows the items for which the par- 
ticular concern had either better or poorer than average results. How- 
ever, the variation with regard to specific items is brought out more 
clearly by the compound ratios in column 3. 

These ratios must be interpreted as showing not the percentage 
difference in absolute dollars by which each item of Concern Number 
10 differed from the average but as dollar for dollar in proportion 
to total sales. That is, the actual dollar profit of this concern was not 
31 per cent less than the average profit of the 28 together but for every 
dollar of profit that should have been made by Concern Number 10 
according to the standard only 69 cents was realized, despite the fact 
that the cost of goods sold was 9 cents on the dollar less than the 
average. Similarly, this concern's selling expense was 32 cents per 
dollar greater than the standard, the handling expense was 20 cents 
greater, and the delivery expense was 36 cents greater. 

The fact that selling, handling, and delivery expense exceeded the 
average was no cause for alarm. Concern Number 10 did a large 
proportion of its total business in fresh fruits and vegetables. It used 
a unique system of selling. Trucks were sent daily through all of the 
territory served by the concern. On the theory that truck drivers are 
not good salesmen, a salesman who had nothing to do with handling 
goods rode on each truck. His duty was to sell goods which the truck 
driver immediately delivered. This system added to the concern's sell- 
ing costs and its delivery costs. The high handling costs arose from 
the nature of the goods. That is, car-load shipments of oranges, grapes, 
tomatoes, and similar things required extra handling on account of 
spoilage, the need for protection from cold weather, and the necessity 
of moving the goods quickly, so that the higher operating costs were 
to be expected. The lower cost of goods was likewise clear enough 
because perishable goods must be marked up a greater percentage on 
cost to provide for waste, spoilage, and the additional cost of quick 
sales. The small profit margin, however, was unsatisfactory and further 
investigation was undertaken. 

The administrative expense was not much above the average but the 
investigation indicated that some economies might be made at that 
point. However, the great variation in the percentage of sales going 
for rent, interest, and taxes was a revelation to the owner. Further 
study showed that rent made up a great part of the total of this item. 



APPLICATIONS OF RATIOS 287 

This raised the question whether the concern could reduce the floor 
space used. The rent paid for the combined office and warehouse 
building in which most of the business was conducted seemed to be 
quite low, but two auxiliary warehouses in which bulky commodities 
such as potatoes were stored had been leased at high rentals. By re- 
arranging the use of space in the main warehouse and reducing the in- 
ventories of these bulky commodities, the concern was able to abandon 
the use of both auxiliary buildings when the leases expired. The charge 
for rent was thereby greatly reduced without any increase in other 
costs. The saving was therefore reflected directly in increased profits. 
This example shows what can be done through the interpretation 
of compound ratios when one of the two sets of ratios is used as a 
standard. The members of the association were able, through the use 
of the association secretary's report, to compare their own operations 
with a standard established under similar conditions. Yet no member 
of the association had divulged any information which he desired to 
be confidential. 

Fluctuating Numerators and Denominators 

Table 50 shows the computation of the change in the ratio of ac- 
counts receivable to sales from August to September. The 25 per cent 
increase in the ratio of receivables to sales in September computed 
as follows, (50 -T- 40) 100% = + 25%, cannot be given any specific 
interpretation. Any information concerning the reasons for the increase 
must be obtained from a study of the original data from which the 
simple ratios were computed. Both the receivables and the sales 
increased but the receivables increased 50 per cent while the sales 
increased only 20 per cent, thus accounting for the increase in the 
ratio. Further study would be needed to discover why the receivables 
increased so much. The indications from the figures are that all of 
the increase in sales was credit business. If such proved to be the case 
management would have to consider the implications of a continuation 
of the same tendency in the future. 

All of the interpretation of the table arose from study of the original 
data rather than the ratios. The ratio of receivables to sales would 
have indicated the change that took place just as well as the compound 
ratio. There is, consequently, no advantage in computing the com- 
pound ratio beyond that of further summarizing the information con 
tained in the simple ratios. 



288 



BUSINESS STATISTICS 



TABLE 50 

COMPUTATION OF PERCENTAGE CHANGE IN RATIO OF RECEIVABLES TO SALES 
FROM AUGUST TO SEPTEMBER 









RECEIVABLES 


CHANGE FROM 
PRECEDING 


MONTH 


ACCOUNTS 
RECEIVABLE 


SALES 


SALES 
(per cent) 


RATIO OF 
RECEIVABLES 
TO SALES 
(per cent) 


August 


$20,000 


$50,000 


40 




September 


30,000 


60,000 


50 


+25 













There are many applications of compound ratios in the analysis of 
business conditions. Those uses in which the denominators of both the 
simple ratios or both terms of the denominator ratio are stable will 
lead to results which can be interpreted rather exactly. On the other 
hand those cases in which the numerators and denominators of both 
ratios vary freely are more difficult to interpret. Because of this diffi- 
culty it is better to avoid the chance of misinterpretation by using such 
compound ratios merely as a guide to information which may be 
obtained from study of the simple ratios involved. 



EXAMPLES OF THE USE OF RATIOS IN BUSINESS 

In this chapter and the preceding one considerable emphasis has 
been placed upon the widespread use of ratio analysis of business data, 
and numerous illustrations have been presented. Such usage ranges all 
the way from the simplest percentages to highly specialized ratios. 
Some of the latter are of particular interest because of the comprehen- 
sive analysis of business activity which results from their use. Four 
examples of specialized ratio analysis are presented here: (1) a rail- 
road analysis; (2) a retail credit department analysis; (3) a depart- 
ment store analysis; and (4) a financial statement analysis. 

Railroad Analysis 

These ratios originate partly in the Interstate Commerce Commis- 
sion and partly in the Association of American Railroads. They pertain 
to several phases of railroad operation as indicated in Table 51. A 
complete explanation of the ratios would require too much space in 
this book. Each ratio provides specific information concerning a certain 
phase of railroad operations. For example, row (f), the freight revenue 
per train-mile, and row (h) , the passenger revenue per train-mile, con- 



APPLICATIONS OF RATIOS 



289 



TABLE 51 

RATIOS USED IN THE ANALYSIS OF RAILROAD OPERATIONS IN THE UNITED STATES 
DATA FOR SELECTED YEARS SHOW THE TREND OF OPERATIONS* 



RATIO 


UNIT 


1900 


1910 


1920 


1930 


1935 


Freight Operations 
(a) Revenue freight per train-mile 
(b) Revenue ton-miles per mile 
of road 


ton 
thousand 
ton-miles 


271 
735 


380 
1 071 


639 
1,597 


699 
1,481 


646 

1,185 


(f) Revenue per train-mile 


dollar 


2 00 


2 86 


6 81 


7 56 


6.51 


(d) Revenue per ton-mile 


cent 


.729 


.753 


1.069 


1.074 


1.000 


Passenger Operations 
(e) Average journey per passenger 
(/) Av. passengers per train-mile. 
(g) Revenue per passenger per mile 
(/>) Revenue per train-mile 


mile 
person 
cent 
dollar 


27.8 
41 
2.00 
1 01 


33.5 
56 
1.94 
1.30 


37.3 
80 
2.76 
2.78 


38.0 
49 
2.72 
1.85 


41.3 
47 
1.94 
1.35 


Finance 
(/') Investment per mile of road. . 
( j) Taxes per mile of road 


dollar 
dollar 


61,490 
233 


64,382 
431 


81,954 
1,262 


105,661 
1,519 


106,339 

1,062 


(k) Operating income per mile 
of road 


dollar 


7,722 


11,866 


24,361 


20,564 


14,483 



* Compiled from reports of the Interstate Commerce Commission and the Association of 
American Railroads. 

tain ratios showing the income from each type of service per unit of 
transportation employed. The compound units, freight-train mile and 
passenger-train mile, are designed especially to measure the basic 
operation of railroading, the movement of a train over a mile of track. 

The difference between the trends of freight and passenger traffic 
can be seen by studying the figures for these two ratios in the table. 
Freight revenue per train-mile increased each decade through 1930 but 
by 1935 had declined to $6.51, a drop of 14 per cent from $7.56, the 
high value of 1930. On the other hand passenger revenue per train- 
mile declined from $2.78 in 1920 to $1.85 in 1930, a drop of 34 per 
cent. A further decrease brought the 1935 revenue to about the 1910 
level. The great decline in passenger revenue per train-mile can be 
understood by reference to "average passengers per train-mile/' 
row (/) . In order to render passenger service, the railroads were forced 
to run their trains even though the number of passengers per train-mile 
in 1935 was little greater than the number carried in 1900. In contrast 
the number of tons of freight per train-mile, row (a), rose steadily 
to a high value in 1930 and was only slightly smaller in 1935. The 
ability of the railroads to control the size and the frequency of operat- 
ing freight trains has resulted in a much smaller decline in revenue per 
freight-train-mile than that experienced per passenger-train-mile. 

The ratios of Table 51 are all of the same type and can be cata- 
logued as follows: 



290 BUSINESS STATISTICS 

1. They are ratios between unlike items expressed in different units, 
including a number of compound units. 

2. All of the ratios are refined to some extent in both numerator 
and denominator, e.g., ''revenue ton-miles" excluded non-revenue 
freight and empty cars transported; "miles of road" is defined to ex- 
clude yard and terminal multiple trackage, all side tracks and auxiliary 
tracks and duplicate main tracks (a four-track line 100 miles in length 
is counted as 100 miles of road). 

3. No compound or standardized ratios are included. 

Retail Credit Department Analysis 

A set of ratios designed to measure the activities of a retail credit 
department was published at the University of Michigan. 

This investigation was undertaken to learn and make generally available 
facts regarding the costs, problems, and performances of credit and accounts 
receivable departments in retail stores. 

From the outset, the study was planned so that as many as possible of the 
resulting facts would take the form of typical figures which could be used as 
standards for appraising performance in individual stores. 6 

The ratios which were developed in the study have been in general 
use since then. They were the following: 

Per cent of returns to net [credit] sales 7 
Per cent of credit office payroll to net [credit] sales 
Per cent of accounts receivable office payroll to net [credit] sales 
Per cent of total payroll [credit department] to net [credit] sales 
Per cent of losses from bad debts to net [credit] sales 
Payroll cost per transaction in the accounts receivable office 
Number of transactions handled yearly per accounts receivable office employee 
Average yearly salary in the credit office, in the accounts receivable office, 
and in both offices combined 

Per cent of collections to first-of-the-month outstandings 
Per cent of credit sales to total store sales 

These ratios are quite specialized but that is exactly what is needed 
in dealing with the peculiar type of work performed by the credit 
department. Their use enables credit managers to follow very closely 



6 Carl N. Schmalz, "Operating Statistics for the Credit and Accounts Receivable De- 
partments of Retail Stores 1927," Michigan Business Studies, Vol. I, No. 6 (June, 1928). 
Bureau of Business Research, School of Business Administration, University of Michigan, 
Ann Arbor, Michigan. 

7 Net credit sales include both charge accounts and installment accounts. In the study 
the information was obtained separately for the two types of accounts, so that the ratio 
analysis could be made for the two separately or combined. 



APPLICATIONS OF RATIOS 291 

the efficiency of collections. Where credit information including 
volume and character of operations of the stores in an area is pooled 
in the hands of an association, ratio analysis of credit conditions for 
the entire area becomes possible. There is tremendous advantage to 
the individual stores in doing this, as it provides a standard with which 
each can compare its own results. 

Analysis of Department Store Operations 

The Bureau of Business Research of the Harvard University Gradu- 
ate School of Business Administration publishes an annual report 
analyzing the operations of department stores. The report is based on 
information supplied by some 600 department and specialty stores 
located in cities in all parts of the country. A list of the ratios used 
in analyzing the information is shown in Table 52. 

TABLE 52 

RATIOS PERTAINING TO OPERATIONS OF 55 DEPARTMENT STORES IN THE UNITED STATES 
HAVING SALES BETWEEN $2,000,000 AND $4,000,000 IN 1934* 

RESULT 
RATIO OBTAINED 

Net gain 

Percentage of net sales 2.4% 

Percentage of net worth 4.6% 

Rate of stock turn (times a year) 

Based on beginning and ending inventories 4.4 

Based on monthly inventories 3.8 

Returns and allowances 

Percentage of gross sales 8.5% 

Percentage of net sales 9.3% 

Distribution of net sales 

Cash 42.0% 

C.O.D 5.3% 

Charge 45.8% 

Installment 6.9% 

Sales per square foot of total space $13.70 

Real estate cost per square foot of total space $0.69 

Sales per employee $5,500 

Losses from bad debts (percentage of charge sales) 95% 

* Selected from Tables 19 and 20 of Carl N. Schmalz, "Operating Results of Department 
and Specialty Stores in 1934," Bureau of Business Research Bulletin No. 96 (June, 1935), Boston: 
Graduate School of Business Administration, Harvard University, pp. 27 and 28. 

The ratios are self-explanatory. They provide a variety of informa- 
tion concerning operations, all of which is essential to management. 
The changes in these ratios from year to year indicate trends in de- 
partment-store operation. Likewise the figures for any year serve as a 
standard to which an individual store can compare its results. For ex- 
ample, a store of comparable size finding that its stock turnover was 
less than 1.0 per year would want to take steps to find what types of 



292 BUSINESS STATISTICS 

merchandise were moving slowly and whether it would be feasible 
to reduce inventory or expand sales, or both. A store with 3 to 4 per 
cent bad debt losses would want to investigate conditions in the credit 
and accounts receivable departments. Similar variations in any of the 
ratios would lead management to investigate. Weak places in the 
organization can frequently be discovered and corrected through this 
type of analysis and comparison. 

Analysis of Financial Statements 

A method of rating a borrower as a credit risk has been developed 
by Alexander Wall. 8 By the use of seven ratios a numerical basis is 
provided whereby bank executives are materially aided in determining 
the lines of credit that can be granted to customers. For a detailed 
explanation of the use of these ratios it would be best to read the 
reference given. We are interested mainly in the statistical application 
of ratios involved and that can be explained most readily with the aid 
of an example. 

Table 53 shows the balance sheet and annual sales of a concern for 
a five-year period. From these annual reports the seven ratios of Table 
54 can be computed for each year and for the five years combined. 
The exact source of the numerator and denominator of each ratio is 
indicated by the letters in parentheses. Thus the ratio of net worth 
to debt for the first year (n -- m} is obtained as follows: ($492,105 -f- 
$156,094) =3.15 or 315 per cent, and all of the other ratios are 
obtained by similar computations. 

Some of these ratios have rather high values and in general very 
careful study is required to judge the concern's standing as set forth 
in Table 54. The next step, therefore, is to refer the individual ratios 
to a standard. The standard chosen is the set of average ratios for 
the five-year period shown in column 6 of Table 54. The compound 
ratios in columns A to E of Table 55 are obtained by dividing each 
of the ratios in columns 1 to 5 of Table 54 by the standard ratios in 
column 6 of Table 54. 

These standardized ratios could then be averaged to obtain a credit 
index. But they are not all of equal importance. The next step, there- 
fore, is to establish a set of weights that can be used to give the 
several ratios their proper importance in determining the credit index. 

8 Alexander Wall and Raymond W. Duning, Ratio Analysis of Financial Statements. 
New York: Harper & Bros., 1928. 



APPLICATIONS OF RATIOS 



293 



TABLE 53 

ASSETS, LIABILITIES AND SALES OF A HYPOTHETICAL CONCERN ANNUALLY FOR A 

FIVE-YEAR PERIOD* 



DATE 


FIRST 
YEAK 


SECOND 
YEAR 


THIRD 
YEAR 


FOURTH 
YEAH 


FIFTH 
YEAH 


TOTAL FOE 
FIVE YEARS 



(a) Cash 


36 285 


34 479 


27 776 


37 206 


51,157 


186,903 


( b) Receivables 


229,437 


208,363 


220 666 


231,171 


233,540 


1,123,177 


(f) Inventory 


236,586 


208,376 


245,367 


265,304 


255,004 


1,210,637 
















(d) Total current . . . . 


502,308 


451,218 


493,809 


533,681 


539,701 


2,520,717 


(e) Plant and equipment 
(/) Miscellaneous 


115,389 
30,502 


146,884 
28,940 


169,045 
39,708 


170,195 
75,219 


132,037 
65,184 


733,550 
239,553 
















(#) Total fixed 


145,891 


175,824 


208,75^ 


245,414 


197,221 


973,103 
















(h) Total 


648,199 


627,042 


702,562 


779,095 


736,922 


3,493,820 



LIABILITIES 



(/) Payables 


139,894 
12,030 
4,170 


152,455 
719 
4,183 


220,539 
4,083 
3,195 


308,880 
5,779 
10,261 


255,459 
7,218 
29,843 


1,077,227 
29,829 
51,652 


(/) Taxes 


(k) Miscellaneous 


(/) Total current . . . 
(m) Total debt . 


156,094 


157,357 


227,817 


324,920 


292,520 


1,158,708 


156,094 


157,357 


227,817 


324,920 


292,520 


1,158,708 


() Net worth 


492,105 


469,685 


474,745 


454,175 


444,402 


2,335,112 




(o) Total 


648,199 


627,042 


702,562 


779,095 


736,922 


3,493,820 


(/>) Sales 


2,590.000 


2,590,000 


2,660,910 


3,068,720 


3,262,808 14,172,438 



* Wall and Duning, op. at., p. 297. 

These weights are shown in column F of Table 55. They depend 
mainly on the accumulated judgment of Mr. Wall and his associates. 9 
The final step in the computation of the index is shown in columns 
G to K of Table 55. The weighted combined result appears in index 
form at the foot of columns G to K, Table 55. The concern shows 
a strong credit position in the first year, is somewhat weaker the second 
year, but still has a high index. Decided weakness appears in the last 
three years, but the last year demonstrates marked recuperative powers 
in the concern as compared with the fourth year. 10 

Summary 

Many additional examples of the application of ratios in the analysis 
of business operations might be cited. Those presented here have been 

9 The complete argument for the use of weights and the reasons for the particular set 

selected are given in the source, pp. 157-62. 

10 Those interested in the interpretation of the credit index should read from the 
source, pp. 299-306. 



294 



BUSINESS STATISTICS 







u 


00 O CM CM -< SO r- 

rH"t o vo r^ v\ o 








w ll 


CM CM CM CM r ^ i ^}' VO 








^ u 

>-/ VO^ 


W-\ V^ CM r- O Xf Xf 

oo CM r\ ON oo r\ rc% 

* t CM -< fO CN VO f^ 






c 

O 






M 




u 


^ jq 53 


xf vs o r-- r-- o vo 

VO OO *^" fM \r\ v\ f^ 












~~~*S^^ 


H 




g 






o 




te 






Q 


tn 


O 

M 
D 


Ci" "^ ** 


r-- r- oo vo ^ r\ o 

*- CM o o oo r^ vo 

CM CM CM CM O <N T\ 




-i 


4 




"""" 


CD 

Q 


O 
u 

P 
3 g 




S*J 


f*~- t^ 00 f<^ *O rO r-i 
CM CM CM CM CM XJ" T\ 

r-T rH rH 


n 

-^ 

^ fl 

S S 


5 g 








S * 


8 




ss| 


CM h- T\ ON VS ITS VO 
CM rA r-* fsj O\ |< fsj 

r-l rH rH 


g 


,2 








i 


W 
CU 

2 








6 

w 


C/5 








H 









j. :::::: 


i 






M 





I 






& 


^^"^ 


s 









il '''^xS^^; 


a 






X 


^O "^*"s^"^ i > ^ K? 











2 >^ ""^^ *' i ^"^ 











1! 42 ''" "I'^J^'I' 








G 


S w '^-'^ x - x A.>-s 








12 

c 


8 ^TT^ ^^ 










issll^f 










2^ <S> (A <S< <f) 










^ t^ ^ rt rt C4 c5 










vJ Z x c/") c/) c/"i c/) 







8^1 


^> O rs O O O r> 

CM rH I-* -" ON >T O 

rH Xf 00 -H O r-l VO 


1R ON 




*W 




ON 


^ 


^, 


\f\ \f\ \r\ O O O fs 

r^ ^ CM nosv5 m 


CM 

r- 00 


w ^ 


^ X 


OO rH r^- O ON 00 /N 


CM 


55 Q 


? B 


rH rH rH rH 


00 


U 


c b 


O r\ ys O O O O 

o CM r- vo rn oo vo 
^\ ^ w^ os ON oo *^ 


o r- 

r<" ON 




-a 


CM rH CM 


ON 


E 
S 

wU 


g..;! 


O ir\ O O O O rs 
O vo O 00 VO r 


CM 

r- CM 


* 


-w 


fTk rH T\ rH rH 


CM 




^, 


o o o o o o *N 

O O O ON **}* CM ro 
I-** rH OS 00 CS CM >f 


ITS CM 

00 rT> 
r-i rH 




~" 


fTk CM CT> rH 





S 











" 


ir\ ir\ r^ o O O >r\ 

CM rH CM r- rH 


O . 

o . 










* 










^u-g 


/> ^f (TN r- OS ^T *~ 






S-si 


00 OS t- r- O r- CM 


* '. 


<A 


^^ 






S^ 


^^ 






J C 


^ i^ 


^ ^ ^ 




u, 


6 f 


r^ r vo o ON oo rn 

-* rH 


: : 


< "*" 


g*| 


C ON O ON v7\ 00 OS 


^ I 


O Q * 








H 








P P 


^ u ^ 


CM r-i 00 00 VO f-H rH 
fO. r- XJ ON O O ON 


: : 


3- 


^ Sc7 
















^^ 


oo o vo ON XT CM r- 






< ^ ! 


"^ ^ r\ oo CN CM oo 







'"'C 








[j 


illifl! 


ii 


c. 


j 




O i- 






3IUIII 


H 



APPLICATIONS OF RATIOS 



295 



chosen mainly to demonstrate the variety of practical uses of ratios. 
The reader will also note the extent to which the ratios are specialized 
to meet the needs of the particular type of business to which they are 
applied. These ratios have been developed as the result of long 
experience and careful study. They are powerful tools of analysis in 
the hands of skilled investigators. Their use by less well-trained per- 
sons, who are unable to adhere to the basic principles of ratios which 
underlie all of the more complicated applications, may easily lead to 
gross error in the interpretation of results. All of which points to a 
final observation. Ratios are perhaps the most widely used statistical 
technique, but it is also true that no other technique produces an equal 
amount of misuse in practice. 



PROBLEMS 

1. What refinement would you recommend in the denominator of each of 
these ratios? 

a) Employees killed in train accidents to total number of employees of 
railroads. 

b) The number unemployed in a community to the number of persons 
in the community. 

c) The ratio of investments of banks to loans and investments to show 
the tendency over a period of years toward increased importance of 
investments in bank portfolios. 

2. The following data were compiled from reports of the United States 
Bureau of Census and the American Medical Association: 



STATK 


PHYSICIANS PER 
1,000,000 POPULATION 

IN THE U. S. 


DEATH RATE PER 
POPULATION IN THE 
REGISTRATION AREA 


1,000 

b s 

(1927) 


California 


1,781 


13.9 




New York 


1,669 


12.3 




Massachusetts 


1 552 


116 




Maryland 


1,501 


13.2 




Illinois 


1,492 


11.4 




Pennsylvania 


1,251 


11.4 




New Jersey 


1,078 


11.2 




VC^isconsin 


1,056 


10.1 













a) Why is the ratio of physicians to population highest in those states 
in which the death rate is highest? 

b) From the figures given compute the deaths per physician in each state. 
Is it true that the deaths per physician are highest where the ratio of 
physicians to population is lowest? Explain the result. 



296 



BUSINESS STATISTICS 



3. 



The following data* appear to indicate that the rate at which workers were 
re-employed in private industry from Relief and W.P.A. rolls was lower 
at the peak of industrial expansion in 1937 than in 1935. 



Duration of 

Unemployment 
(number of 
months) 


MARCH, 1935 


MARCH, 1937 


Number of 
Unemployed 
Workers on 
Relief and 
W.P.A. 


Number of 
Workers Leaving 
Relief and 
W.P.A. 
Because 
of Piivate 
Employment 


Number of 
Unemployed 
Workers on 
Relief and 
W.P.A. 


Number of 
Workers Leavine 
Relief and 
W.P.A. 
Because 
of Private 
Employment 


Less than 2 .... 
2 to 4 


110 
152 
204 
219 
255 
392 
477 
515 
404 
651 


15 
23 
20 
9 

7 
7 
7 
5 
4 
5 


104 
131 
147 
222 
260 
415 
573 
521 
397 
583 


12 
16 
14 
10 
9 
8 
10 
6 
4 
5 


4 to 6 


6 to 9 


9 to 12 


12 to 18 


18 to 24 


24 to 36. .. . 


36 to 60 


60 and over. , . . 
Total 


3,379 


102 


3,353 


94 





Over-all rate: 

March. 1935 



102 



= 3.02% 



March, 1937 



94 



= 2.80% 



* The figures are the result of making necessary adjustments to insure comparability 
in the records available in a citv which had about 28.000 gainfully employed according 
to the 1930 census. 

a) Compute the re-employment rate according to duration of unemploy- 
ment for each period. 

b) Do the results confirm the decline shown by the over-all rate? If not 
explain the discrepancy and devise a plan for the construction of 
comparable over-all rates. 

4. State the three types of relation between the four elements of a compound 
ratio. 

5. Given the following data concerning population and number of persons 
paying income tax in the United States: 





1929 


1936 


1. Population (estimated) 


121,945,000 


128,883,000 


2. Number filing income tax returns 


4,044,327 


5,413,499 


3. Percentage of population filing income tax returns. . 
4. Percentage increase in number filing returns in 1936 
5. Number of income tax returns by persons with net 
income over $5 000 .... 


3.32 
1 032 071 


4.20 
+26.51 

677011 


6. Percentage of those filing returns who had income 
over $5,000 


25.52 


12.51 


7. Percentage decrease in 1936 in number filing re- 
turns who had income over $5 000 




50.98 









APPLICATIONS OF RATIOS 297 

a) ALC the per cents in rows 4 and 7 equally valid? Why or why not? 

b) Do these two per cents show that income has increased in the middle 
brackets at the expense of larger incomes, i.e.,* that wealth is gradually 
becoming more evenly distributed? Discuss. 

6. What basic error would be involved in averaging the ratios in column 3 
of Table 49 of the text? 

7. The following data are computed from annual reports of the United States 
Steel Corporation: 





1929 


1939 


Output per man-hour 


$1 80 


$2.48 


Average wage per man-hour 


.674 


.897 









Labor leaders argue from these ratios that labor has not received a fair 
share of its increased productivity. What further evidence should be intro- 
duced before reaching a conclusion on this point? 

8. Which of the credit department ratios on page 290 are favorable when 
they increase? Which are favorable when they decrease? Explain in each 
case. 

9. Compute from Table 52 the change in net gain as a percentage of net 
sales, if all losses from bad debts were eliminated. 

10. a) Explain the two methods of measuring inventory used in obtaining 

the stock turnover in Table 52. 
b) Which of the two is more exact? Why? 

11. The heading of the stub of Table 55 of the text is "Compound Ratios." 
Which of the three types of compound ratios are these? 



REFERENCES 

BURGESS, ROBERT W., Mathematics of Statistics. Boston: Houghton Mifflin 
Co., 1927. 

Chapter III contains an explanation of the effect on the resulting ratios 
when weights are shifted. 

RHODES, E. C, Elementary Statistical Methods. London: George Routledge 
and Sons, Ltd., 1933. 

Chapter 5 gives an excellent exposition of the use of refined ratios. 

WALL, ALEXANDER, and DUNING, RAYMOND W., Ratio Analysts of Financial 
Statements. New York: Harper & Bros., 1928. 

Chapters V-XII deal specifically with ratio analysis, although the entire 
book should be read to understand the system of analyzing financial state- 
ments. 



CHAPTER XIII 
GRAPHS 

INTRODUCTION 

AJY representation of statistical values and relationships in 
pictorial or diagrammatic form is called a statistical graph. 
There are many forms in which such graphs are commonly 
used, and new and ingenious devices are constantly being worked out 
for the graphic presentation of statistical data. 

Reasons for Using Statistical Graphs 

Graphic methods have been developed to meet the needs of two 
major classes of users, the lay reader and the statistician. 

For the Lay Reader. It is a well-recognized psychological principle 
that a direct visual concept such as color or size can be more quickly 
grasped and more readily understood than a set of numbers and table 
headings. A statistical table must first be read and than translated 
mentally into the actual concept of dollars of wages, bushels of wheat, 
etc. Although this is a process that is no more difficult than the 
reading of any other kind of printed material, the average reader is 
so frightened at the mere sight of a set of figures that he is likely to 
shy away from any table without even trying to see what it is about. 
In order to lure the attention of such readers, attractive graphs are 
an almost indispensable accompaniment to any popular exposition of 
statistical material. 

For the Statistician. Statisticians have also discovered that appro- 
priate graphic methods have sometimes clarified relations that remained 
obscure even after careful study of the numerical data. Graphs of 
analysis have therefore become a tool of the statistician for his own 
benefit as well as a medium for explaining his final results to others. 

Purposes Served by Graphic Methods 

Standard methods can best be analyzed by reference to some exam 
pies of a few well-known kinds of statistical graphs. Four of these 

basic types are illustrated in Figure 35. The first, A, is a simple bar 

298 



GRAPHS 



299 



FIGURE 35 
TYPES OF GRAPHS 



POPULATION OF THREE LEADING CITIES 
ON THE WEST COAST, 1930 



LOS ANGELES 
SAN FRANCISCO 
SEATTLE 




300 



600 
THOUSANDS 



900 



1200 



BJ 



SOURCES OF CASH FARM INCOME 
UNITED STATES, 1938 

(MILLIONS OF DOLLARS) 




PRODUCTION OF STEEL INGOTS 
AND CASTINGS BY UNITED STATES 
AND REST OF THE WORLD, 1939. 





U S. REST OF WORLD 

(32 MILLION TONS) (68 MILLION TONS) 



AIR FORCES OF LEADING COUNTRIES 
SEPTEMBER 1940, (INCLUDING TRAINING SHIPS) 



GERMANY 
GR. BRITAIN 
RUSSIA 
ITALY 
JAPAN 
U. S. A. 



' EACH SYMBOL EQUALS 
2000 PLANES 



graph in which the length of a bar represents the population in each 
of the three cities. In B, the total cash farm income in the United 
States is represented by the complete area of a circle, and the pro- 
portion derived from each source of income is indicated by a sector 
of the circle, each sector shaded to distinguish it from the others. 



300 BUSINESS STATISTICS 

Two squares are shown in C, the areas of which indicate the number 
of tons of steel produced by the United States and by the rest of the 
world in a given year. A number of equal-sized symbols in D indicates 
the approximate strength of the air forces of several countries in 
1940. 

From these examples the first statistical purpose of graphic methods 
can be observed. 

To Show Numerical Values and Relationships. In place of the 
actual figures that appear in the cells of a table, numerical values are 
represented by diagrams. These may consist of geometric forms such 
as the length of a line, the number of degrees in the sector of a circle, 
a square or other area, or they may contain a small number of symbols 
that scarcely need to be counted. 

The numerical relationships between these values also can be grasped 
instantly without the necessity of reducing them to the form of ratios 
or other statistical measures. Referring again to Figure 35: In A the 
eye unconsciously estimates the lengths of the three bars, so that it is 
not even necessary to read the scale to perceive the relation of the 
populations of the three cities. In B the order of importance of the 
various sources of cash farm income and the proportion of each to 
the total can be seen without measurement. Likewise in D, the relative 
strength of the air forces of the several countries can be estimated by 
comparing the lengths of the several rows of symbols, without either 
counting the symbols or knowing how many units each symbol stands 
for. It is not so easy to compare the size of the two squares in C, and 
for this reason comparison by means of areas 1 is considered less 
effective than linear comparisons. 

The simple types of graphic form in these illustrations show nothing 
except numerical values and relationships. No attempt has been made 
to introduce other characteristics of the data such as the spatial rela- 
tionships between the cities in Part A, between the United States and 
the rest of the world in C, or between the various countries in D. 
However, the pictorial representation of non-numerical relationships 
such as these is the second major purpose of some statistical graphs. 

To Show Non-numerical Relationships. By the use of more com- 
plete and specialized types of graphs, time, space, or qualitative attri- 
butes can be presented in addition to numerical values. This is true 

1 Perspective drawings of three dimensional objects in which values are represented 
by vol\UM||girStfH more difficult to evaluate. 



GRAPHS 301 

in spite of the fact that the number of possible methods for translating 
data into diagrammatic form is essentially limited to the few that 
have already been illustrated. There are not more than half a dozen 
methods altogether that can be used on a plane surface such as a 
page or a wall chart, but the number of ways in which dots, lines, 
areas, etc., may be combined in different arrangements has led to a 
seemingly endless variety of graphic types. 

Every statistical table, except those in which the grouping is purely 
quantitative, contains in the stub or column headings a classification 
according to one of the non-numerical characteristics. In order to 
emphasize relationships between classes more realistically than is pos- 
sible in tabular form, the data can be presented in certain specialized 
kinds of graphs. 

Space: An alphabetical list of states conveys no picture whatever of 
actual geographical relationships; for instance, that Missouri is just 
across the river from Illinois; that Idaho, Nevada, and Utah form a 
group of mountain states; and that all the largest cities on the eastern 
seaboard fall within a radius of a few hundred miles. A statistical 
map in which certain numerical values pertaining to each state are 
depicted in their actual geographical arrangement lends itself to much 
more penetrating analysis of spatial relations. 

- Time: A column of dates may indicate successive years, while the 
adjacent column shows the volume of production in each year. This 
tabular form of presentation gives no such vivid concept of actual time 
movement and of the accompanying growth or decline in production 
as is afforded by a line whose movement across a graph one can follow 
along with the passage of the years. 

Attributes: Qualitative attributes are difficult to represent graphi- 
cally except by symbols. If black bars and shaded bars are used to 
represent male and female respectively, one has to read the key in 
order to have the slightest inkling of what the bars stand for. Rows 
of different symbols may be used instead of bars, as in Figure 35-D. 
This is one-third of the original graph. The middle section contained 
rows of warships representing the sea power of the six countries and 
the lower section contained rows of soldiers representing the armed 
forces of the six countries. In the three parts of the graph, therefore, 
unlike units rather than different attributes were being distinguished, 
but the same idea can be carried over to symbols representing attributes 
such as urban and rural, white and Negro, etc. 



302 BUSINESS STATISTICS 

Construction of Graphs 

Tests of a Good Graph. Regardless of the particular purpose, every 
graph can be tested by one general criterion: The method of graphic 
construction is good if it produces an effective picture of important 
relationships and gives a true representation of statistical facts. Con- 
versely, a method that results in an ineffective, uninteresting graph, or 
Chat distorts statistical facts, is bad. 

Steps. The steps in making a graph are the same as those followed 
unconsciously by a reader of the graph, but in reverse order. The 
reader's attention is first caught by the effectiveness, or attractiveness of 
a graph. If his interest is aroused, he then studies the graph and notes 
what information is presented through the various devices that have 
been used. Without obvious effort on his part, the facts and rela- 
tionships which the graph was intended to emphasize will become 
clearly impressed upon his mind. 

Interpret the data: For the statistician who plans and constructs the 
graph, however, this process of interpretation which is so easy for the 
reader becomes the first and most important problem. Before planning 
to illustrate any given set of data by means of a graph he must decide 
what relationship he intends to emphasize. He may wish to compare 
percentage changes in each of several commodities over a period of 
years, or their values in absolute amounts, or their percentage relation, 
either to each other or as parts of a total value in each year. The 
example of various ratio comparisons from a single set of primary 
data shown in Table 39, page 262, and the accompanying interpreta- 
tions illustrate one of the possible initial steps of analysis that the 
statistician takes in extracting significant information from the data. 
7 Choose the best graphic method: His next step is to determine 
which type of graph is best suited for presenting the relation he 
has determined is most important. There is often more than one way 
of picturing a statistical relationship, whether numerical or non- 
numerical, but years of testing have put the stamp of approval on 
certain methods as particularly effective for each purpose. These will 
be the subject of the balance of this chapter and the major part of 
chapters XIV and XV. 

~j Draw the graph effectively: The final step is to plan the actual 
arrangement and draw the graph. The factors that contribute to artistic 
effect and accuracy are more or less common to all kinds of graphs, 
and will be considered in chapter XIV after the detailed discussion of 



GRAPHS 303 

the simple types of graphs and some of the more complex graphs. It 
will be assumed throughout that the graphs are being prepared either 
for the business man who is not a trained statistician or for the general 
public. Graphs drawn for the use of statisticians should follow prac- 
tically the same principles, although with greater emphasis on accuracy 
of detail and less on pictorial effect. 

SIMPLE TYPES OF GRAPHS METHODS AND PURPOSES OF EACH 

In the ensuing discussion there is no intention of describing or even 
naming every kind of graph that has ever been used or that could be 
used. Only the commonly accepted types that have proved most suc- 
cessful in the presentation of statistical material 2 will be described in 
detail, and the major emphasis will be placed on the special purposes 
for which each is particularly well adapted. 

Maps 

The most obvious method for picturing spatial relationships is the 
map; it is more truly an actual "picture" of the facts than any other 
type of graph. There are many types of maps, ranging from the ordi- 
nary outline map which shows only geographical boundaries of land 
and water to so-called "distortion" maps at the other extreme. It is 
not the purpose of this section to describe all the possible maps that 
may be useful for various purposes, but only to point out the necessary 
features of maps that are definitely statistical. A map will be considered 
statistical only when quantitative relationships are represented by some 
pictorial device instead of in numerical form. 

/ Location Maps. Outline maps form the background of statistical 
maps, and are also frequently used in business for non-statistical pur- 
poses. For example, a sales manager interested in knowing where each 
of his salesmen is working from day to day may move colored pins 
about on an outline map of the territory. But this is not a statistical 
map because no numerical information is involved. 

Some location maps indicate non-numerical differences, such as 
states having certain types of liquor control or those in which a certain 

2 Business men use many types of diagrammatic representation that are adaptations 
of standard graphic methods. A variety of these, such as organization charts and Gantt 
charts are described in books on applied graphics. Space limitations have prevented the 
inclusion of a comprehensive presentation of these usages in a textbook on statistical 
method, although a few specialized forms appear in chapter XXVI. 



304 BUSINESS STATISTICS 

gasoline corporation has retail outlets. Different colors or shades of 
cross-hatching are used in such maps, but unless the differences are in 
some way quantitative the map is not statistical, and its construction 
need not be governed by the rules given later for cross-hatched ratio 
maps. Many other location maps are found in print that present 
numerical information but are nevertheless not statistical. This is true 
of any map in which the quantities, ratios, etc., are simply inserted in 
each state or other subdivision in figures instead of being represented 
by shading or some other pictorial device. For most purposes such a 
map is harder to read than a table of the data, and no total visual 
impression is given of the spatial distribution of the numerical values. 

A map in which evenly dotted areas signify the principal oil fields 
in the United States is also merely a location map. However, if each 
dot should be located at a point where there was production of 10,000 
barrels of oil during a given period, the map would become statistical. 

Dot Maps. To show density: Such a map as the one last described 
is called a small-dot or point-dot map. The use of this type is illus- 
trated in Figure 36-A, which shows the location of filling stations in 
the United States in 1935. Each dot represents 20 filling stations in a 
county, and, since the information was available, they have been clus- 
tered within actual county limits, even though county boundaries are 
not shown on the map. In some states the dots are so close together 
that it is impossible to count them, but a clear impression is gained of 
the general distribution of filling stations throughout the United States, 
that is, of the relative density of filling stations in each state and in 
various sections of the country. If too large an amount is represented 
by each dot, they will be so scattered that no great density is apparent 
in any subdivision; on the other hand if the unit is too small certain 
areas become entirely black. There is no intention that the dots should 
be countable in any subdivision but the unit must be so chosen that the 
effect of density will be clearly brought out. 

To show quantity: 'If the actual totals in round numbers are wanted, 
large dots are used instead of small dots, 'as illustrated in Figure 36-B. 
A certain effect of density is also provided by this map, but the dots 
are grouped in blocks by rows and columns, to facilitate counting, 
instead of being distributed throughout each state. The unit repre- 
sented by each large dot is so chosen that no area will contain too many 
dots to count easily, and none will be too crowded to contain all the 
dots it should have. If dots must be drawn in the Atlantic Ocean to 



FIGURE 36 

FILLING STATIONS IN THE UNITED STATES, 1935 
A. Point-Dot Map Showing Density 




ONE DOT REPRESENTS 
TWENTY FILLING STATIONS 
IN A COUNTY 



Reproduced from Market Research Series, No. 18, United States Department of Commerce 



B. Large-Dot Map Showing Number 



500 FILLING STATIONS 
250 FILLING STATIONS 




Data from Census of Business, 1935 



306 BUSINESS STATISTICS 

represent Rhode Island, Connecticut, Delaware, and other small states, 
the true spatial relationship is distorted. 

It should be noted that, except for the last dot in each state, each 
dot represents an exact amount. In this example, each solid dot stands 
for exactly 500 filling stations, except that the last solid dot stands for 
the round number 375-625 when no half dot has been added. When 
a half dot has been added, each whole dot represents exactly 500, but 
the half dot may represent any number from 125-375. There will 
never be more than one partial dot in any given area. Partial dots 
are sometimes subdivided into quarters; i.e.; f black indicates f of the 
total unit, J black indicates i of the total unit, and an empty circle may 
be used for less than i of the unit amount. However, this practice tends 
toward greater precision than is necessary in a graph. 

The use of large dots is similar to the method of equal-sized symbols 
illustrated in Figure 35-D, and practically the same effect would be 
secured if symbols of gasoline pumps were used in Figure 36-B instead 
of large dots. Other forms of large dot maps can be found in print, 
in which the large dots are not all uniform. For example, instead of 
5 equal-sized dots to represent 2,500 filling stations, a single dot 5 
times as large might be used. This method involves the difficulty of 
estimating the relative areas of circles. In other cases there may be 
an attempt to show two or more different sets of information on the 
same map by means of large dots that are equal-sized but in several 
colors or shadings. This method is likely to result in a confused 
picture of spatial relationships instead of one that stands out clearly. 
Any deviation, therefore, from the solid large dot as illustrated in 
Figure 36-B is usually unsatisfactory for the purpose of showing 
geographical distribution of quantities. 

Ratio Maps. Although both small- and large-dot maps usually 
represent absolute quantities, there is an implied ratio even in these, 
because the quantities are distributed with relation to the area of each 
subdivision. This is particularly true of the small-dot or density map. 
The actual space covered by the area of each state is, in a sense, the 
denominator, and the number of point dots in each is the numerator. 
The resulting effect of density becomes a pictorial representation of 
the ratio, number (of some unit) to area. In Figure 36-A, for example, 
density increases either when the denominator (state area) is decreased 
or when the numerator (number of dots or filling stations) is increased. 



GRAPHS 307 

However, the purpose of a statistical map is often to show ratios 
in which the denominators are some values other than areas; for ex- 
ample, retail sales data might be used to show percentage of change 
over the preceding year, or sales per capita. Some pictorial device 
other than dots must therefore be used to summarize these ratios for 
spatial comparisons between the different sections of the map. The 
most usual method is by shading or cross-hatching, as illustrated in 
Figure 37. 

Principles of cross-hatching: The types shown in the example are 
by no means the only possible kinds of cross-hatching, but they illus- 
trate the principles involved. (1) The smallest ratio, i.e., the least 
density of occurrence of the characteristic, is best represented by white, 
and the largest by solid black, since these afford the greatest range of 
contrast. (2) The degrees of density should be in unmistakably 
ascending order from white to black. (3) This gradation is ac- 
complished by increasing the heaviness of the lines, decreasing the 
space between them, or both; crossing them and filling in the inter- 
vening spaces are the next steps. Changing only the direction of 
the lines does not have any effect on the density of appearance. Alter- 
nate heavy black and white stripes may or may not appear darker 
than crossed or plaid effects and should therefore not be used in com- 
bination with them. The use of dots is undesirable for two reasons: 
it is often difficult to compare the density effect with that of lines, and 
there is danger of confusion with the point-dot type of map. 

Interpretation of a cross-hatched map: In Figure 37, the same data 
that were used in Figure 36-B are expressed as ratios to the population 
of each state. 8 There is no unmistakable impression to be gained from 
a single glance at this map, except that in the states west of the Mis- 
sissippi River the ratio of filling stations to population tends to be 
higher than in the east. It is highest in the central farm states, in 
Washington and Oregon, and in Florida. The last named is most 
easily explained since the presence of many out-of-state cars produces 
a need for more filling stations than the native population would re- 

3 Note that the purpose of this map is to show the density of filling stations with 
relation to the population. In order to avoid fractions of filling stations, the denominator 
has been expressed as "per 10,000 persons" instead of "per capita." An alternative 
method for stating the ratios as whole numbers would be to invert them, using "number 
of persons per filling station." If this were done, 400 persons per station would nv?an 
greater density of filling stations than 1,000 persons per station. The lower ratio would 
then more properly be represented by dark shading and the higher one by light, resulting 
in the same total effect as in Figure 37. 



O g 

C o 

I 





a.' 



O 



Q 

w 
u 

H 

DC 





6 




GRAPHS 309 

quire. A study of the degree of car ownership and the miles of sur- 
faced roads per capita in each state would aid in interpreting these 
ratios, but even a knowledge of these total state data is not enough. 
Neither will explain why such rural and relatively poor states as 
Mississippi, Alabama, Georgia, Tennessee, and Kentucky should fall 
in the same category as wealthy, industrial, and urban New York, 
Pennsylvania, and Massachusetts. 

It takes careful study of the situation to realize that there is one 
pertinent factor common to these two kinds of states: a large pro- 
portion of the population does not own cars, and most of the car 
owners drive only short distances daily. This is the reverse of condi- 
tions in the comfortably prosperous farm belt and also in the large 
western states where distances are great. In the largest cities, such as 
New York, Chicago, Philadelphia, Baltimore, and Boston, traffic con- 
gestion is so great that a small proportion of families owns cars in 
comparison with small town and rural families in the same income 
groups. Thus the state ratio of filling stations to population will be 
reduced accordingly in New York, Illinois, Pennsylvania, Maryland, 
and Massachusetts. In the poorer regions of the south, comprising the 
greater part of the white and lightly hatched states, incomes are in 
general too small to lead to car ownership. North Carolina with 
its growing industries and improved roads is an exception in 
this area. 

Because of these different factors that cause both numerators and 
denominators to fluctuate in the ratios depicted on this map, the pic- 
ture presented is not as clear cut as can sometimes be achieved when 
the rates or other ratios pictured are definitely affected by a single 
condition of topography, climate, transportation facilities, concentra- 
tion of population and industry, etc. In any case the planning of an 
effective cross-hatched map requires in a marked degree the co-ordina- 
tion between artistic ability and statistical judgment which is stressed 
in the last part of chapter XIV. The rates, prices, per cents, or other 
ratios should be so grouped that if there are any significant spatial 
relationships they will stand out in the finished product. The 
choice of the right groupings is a very important factor in achieving 
this end. 

Number and width of size groups: The first step in making a cross- 
hatched map is to work out the individual per cents or other ratios that 
are to be shown. These are next arrayed in order of size and studied 



310 BUSINESS STATISTICS 

for any logical grouping that may be seen. If no natural dividing 
points are obvious, the items should be grouped so that an approxi- 
mately equal number will fall in each category. In Figure 37, the 
number of states in each group is as nearly equal as the data permit 
10, 11, 12, 9, and 7 although the size groups are of uneven width 
under 12, 12-17, 17-20, 20-22, and 22-27. This is a departure from 
the rules that will be noted later in chapter XV for the presentation 
of frequency distribution tables and graphs. 4 

The number of groups has in this case been limited to five. More 
than six or seven kinds of cross-hatching are seldom effective on a map, 
and, in order to emphasize the contrasts, it may be desirable to have as 
few as three magnitude classes. 

Special varieties of ratio maps: Certain conditions that are not con- 
fined within political boundaries may also be indicated by areas of 
cross-hatching. This kind of map is used chiefly to show belts of 
rainfall, crop conditions, etc., all of which are rates, per cents of 
normal, or other ratios grouped in class intervals. Non-statistical terms, 
as "good," "fair," and "poor" are sometimes used instead of a numer- 
ical measure for indicating crop, weather, or business conditions, but 
any such classes should be based upon numerical standards generally 
understood or defined in accompanying notes or text, and the rules 
stated above for ratio maps should govern the plan of shading. 

Flow Maps. These maps use a device not previously named, that 
is, numerical values are represented by the width of lines instead of 
by their length. The direction of the lines adds a non-numerical spatial 
relationship. This method has been found valuable chiefly in studies 
of traffic density. The same idea is followed in connecting areas of 
supply with areas of distribution. This is illustrated in Figure 38 
showing the flow of exports from the United States to Canada and 
to other continents. Each line represents 5 per cent of exports, and 
the width of the combined lines indicates the proportion going to 
each area. 

An alternative method utilizes circles of varying sizes to express the 
quantities at the point of supply, with arrows indicating the direction 
of distribution. Either method may be employed in simple diagram- 
matic form instead of on a map as background. 

4 The entirely different graphic presentation of similar data in a frequency diagram 
requires class intervals of equal widths, whereas the number of items varies from class 
to class. See Bruce D. Mudgett, Statistical Tables and Graphs (Boston: 1930), Hough- 
ton Miffim Co., pp. 179, 187-89. 



GRAPHS 



311 



FIGURE 38 
FLOW MAP: UNITED STATES EXPORTS, 1931 



ONE HALF OF OUR EXPORTS GOES TO EUROPE 




I 1% TO AUSTRALIA INCLUDED IN AFRICA 

EACH LINE EQUALS 5% OF TOTAL VALUE OF EXPORTS FROM THE UNITED STATES IN 1931 



Circle Graphs 

For certain purposes a circle graph is superior to bars or other 
linear measures. However, by its very nature it is adapted to but 
few usages. 

Parts of a Total. -4Thz circle is a classic symbol of unity; hence it is 
ideal for representing total values. When divided into sectors, as in 
Figure 35-B, the same visual effect is given whether these values are 
expressed in per cents or as actual amounts.) The actual total of cash 
farm income was $8,000,000,000; the part contributed by crops was 
nearly $3,200,000,000 or 40 per cent of the total, but whichever way 
it is designated its sector will cover about 144 degrees, or | of the 
circle. Furthermore, since the angle measured by a given number of 
degrees is the same regardless of the size of the circle, it is possible 
to use two or more circles whose areas 5 are proportional to compare 
the absolute amounts of several totals, yet the number of degrees in 
the corresponding sectors will also be comparable. This advantage is 
one not possessed either by linear bars or rectangular areas in repre- 
senting parts of several totals. However, there is a certain tendency 

5 The areas of circles are in proportion to the squares of their radii. 



312 BUSINESS STATISTICS 

for the eye to measure the entire area of a sector instead of its angle. 
Therefore, if the emphasis is on comparison of the proportions of 
corresponding parts rather than on difference in total magnitudes, it 
is probably better to use equal-sized circles. 

If the graph includes two or more circles, there should be a starting- 
point common to all of them, usually the radius extending upward 
from the center, although other quarter positions are also permissible. 
In each circle the various sectors should follow the same order, clock- 
wise, around to the starting-point. The cross-hatching of sectors usually 
indicates different attributes or geographical divisions rather than 
quantitative differences, as in the case of cross-hatched maps; hence 
an ascending scale of density is not necessary. A better contrast is 
achieved if dark sectors alternate with light ones, as in Figure 35-B. 
The kinds of cross-hatching to be used therefore need not be definitely 
distinguished in degrees of density so long as no two sectors look 
too much alike. Dots are permissible and concentric curved lines are 
also used. Diagonal lines, however, should follow the same direction 
on the page in each circle, regardless of the direction of the radii in 
the sector. A key to the cross-hatching may be used instead of printing 
the legends in each sector, particularly when there are two or more 
circles. 

Dial Indexes. During the depression decade when practically every 
measure of business conditions was below normal, it became customary 
to use circle diagrams to show percentage of normal, the entire circle 
representing 100 per cent. These graphs were easy to understand but 
basically incorrect. In this case 100 per cent is not a total or maximum 
value but merely indicates a normal or average condition; it is possible 
for any index to rise above 100, whereas a circle can never represent 
more than 100 per cent. After some attempts to show an extra "piece 
of pie" on top of the whole, or bulging out at one side, the dial form 
of index was developed. In these, each circle measures to 100, with 
100 at the top, but the scale extends around to 120 or 150 if necessary 
in place of 20 and 50. Several pointers from the center mark the value 
for this month, last month, last year, etc. This method of showing 
index values at selected periods is quite effective, easy to read and 
affords correct comparisons. Another variation, instead of reproducing 
the entire dial, shows only a section of it, as illustrated in Figure 39. 
This makes it possible to use a much larger scale dial without requiring 
the space for a large complete circle. 



GRAPHS 



313 



FIGURE 39 
DIAL CHART: INDEX OF INDUSTRIAL ACTIVITY AS OF MAY 31, 1941 




ASSOCIATED PRESS INDEX OF INDUSTRIAL ACTIVITY 
COMPONENTS: 



AUTO PRODUCTION 
STEEL OUTPUT 
COTTON MFC 



THIS LAST 
WEEK 
129.0 



WEEK 
130.0 



137.5 



170.0 



137.7 



166.0 



THIS UST 



ELECTRIC POWER 
RESIDENTIAL 8LDG. 
RAILWAY FREIGHT 



WEEK 
144.4 



101.8 



90.8 



WEEK 

143.3 



105.0 



91.3 



Reproduced from Buffalo Evening News. 

Linear Graphs 

Practically any numerical value can be represented by the length of a 
line, and consequently bars and other adaptations of linear graphs are 
more widely used than any other form of statistical graph. The more 
complicated forms, involving time and quantitative relationships will 
be described in the next two chapters. This section deals only with 
the simpler types in which there is but one scale, which may extend 
either vertically or horizontally. 

Pictograms. As has already been suggested, a pictogram of the 
kind shown in Figure 35-D is the simplest form of linear diagram. It 
dates from prehistoric times, when primitive peoples drew five canoes 
or ten moons to represent such concepts as quantities or the passage 
of time. Where standardized symbols of very simple form, all the same 
size and evenly spaced, are shown in rows, each symbol represents a 
certain number of original units and several groups of them can be 
compared according to the lengths of the rows. The example in 
Figure 35-D appears to have no scale, but actually there is one, 
horizontally. The advantage of this graphic device as a means of 
distinguishing attribute classifications has already been noted. When 



314 



BUSINESS STATISTICS 



the method is carried to the extreme of using several variations of the 
symbol in each row and other symbols at the ends of the rows to 
indicate the different attribute classifications^ as illustrated in Figure 
40, the graphic requirements of clarity, simplicity, and effectiveness 
are likely to become lost in confusion. 

Many pictogram forms that enliven the reports of governmental or 
other agencies do not fall within the scope of a discussion of statistical 
graphs; they may succeed in attracting attention but they do not 
convey numerical ideas. If they do aim to present statistical material 
the only acceptable method is the linear form, made up of identical 
symbols. No statistical misconception can result from such graphs, as 
does occur through the use of symbols or pictures of different size or 
those not identical in form. 'The rules governing the use of pictograms 
can be summed up as follows: 

FIGURE 40 
PICTOGRAM: NUMBER OF WORKERS IN BASIC FIELDS OF EMPLOYMENT, 1940 



WHAT KIND OF WORK DO THEY DO? 



MANUFACTURING AND MINING 




O 

AGRICULTURE 



O O O O O O 




WHOLESALE AND RETAIL TRADE 




GOVERNMENTS CIVIL AND MILITARY 




OTHERS TRANSP. FINANCE AND SERVICE OTHER 

Each symbol represents one million workers 



PREPARED BY PICTOGRAPH CORPORATION 

Reproduced from N*w York Times Sunday Magazine, April 13, 1941. 



GRAPHS 315 

/ 1. Symbols should be self-explanatory. 

2. Larger quantities are shown by a larger quantity of symbols, not 
by larger symbols. 

3. Charts compare approximate quantities, not minute details. 

4. Only comparisons should be charted, not isolated statements. 6 
Bar Graphs. In Figure 35-A plain continuous bars serve the same 

purpose as rows of symbols. However, to most readers the bars are just 
as easy to understand if not more so. 

" Single bars: 'Single bars are often ased to depict the separate classes 
of an attribute classification. They are also suitable for showing geo- 
graphical classifications in which the spatial relationship of one class to 
another is not so important that a map is required; 'the three cities in 
Figure 35-A are an example. Such bars may also indicate values at 
selected periods of time that do not constitute a continuous time series. 
Each bar always represents a number or value for a single attribute or 
place or period. The bars are arranged along a base line which has no 
scale but which is labeled just like the stub of a table. The same prin- 
ciples as in tabulation determine the order of arrangement of the bars. 
That is, they may be in ascending or descending order of size, alpha- 
betical, or in any other logical order. 

Groups of bars: As in a table, bars are often arranged in sub- 
classifications grouped according to whatever emphasis is desired. Solid 
black may be used for all the bars, whether single or in groups. When 
there are several subgroups, each containing the same classes of items, 
cross-hatching is frequently used to identify corresponding bars in each 
group. As in the case of circles and sectors, this hatching is merely a 
means of distinguishing each attribute from the others, and no order 
of density is prescribed except that the same order should be followed 
in all the groups. Figure 41-A uses this method to compare the num- 
ber of wage earners in three leading food industries, at three census 
periods. 

Divided bars: In Figure 41-B 7 each bar is subdivided into a number 
of segments. This type of bar graph is used for the same purpose as 
the circle and sectors, to indicate parts of a whole, and the rules for 
cross-hatching the parts are the same as for the sectors of a circle. In 
planning the graph, it must first be determined which is more im- 
portant, to present an accurate picture of the total values, or to afford 

6 Rudolf Modley, How To Use Pictorial Statistics. New York: Harper & Bros., 1937. 

7 Reproduced from Survey of Current Business (December, 1937), p. 13. 



FIGURE 41 
TYPES OF BAR GRAPHS 



NUMBER OF WAGE EARNERS IN THREE LEADING 
FOOD INDUSTRIES U.S. CENSUS OF MANUFACTURES 
1914, 1929, 1937 



THOUSANDS OF 
WAGE EARNERS 



200 



100 iff] r^ *& 




200 



100 



1914 _ 1929 

BAKERY PRODUCTS ESS MEAT PACKING 
CANNED FRUITS & VEGETABLES 



1937 



OCCUPATIONAL DISTRIBUTION OF NONRELIEF 
FAMILIES SAMPLED BY DEPARTMENTS OF 
AGRICULTURE AND LABOR, 1936 

1 METROPOLIS 

[CHICAGO] 

3 LARGE CITIES 
9 MIDDLE SIZE CITIES 
25 SMALL CITIES 
107 VILLAGES 

WAGE EARNERS 

CLERICAL 

BUSINESS & PROFESSIONAL 




PERCENT OF NONRELIEF FAMILIES 

IN LARGE CITIES IN EACH 
OCCUPATIONAL GROUP, 1936 




WAGE 

CARNERS AN 

PROFESSIONAL 



PERCENT OF FAMILIES ABOVE AND BELOW $750. INCOME 



PERCENT BELOW 
100 80 60 40 



$750 
ANNUAL^ INCOME 

20 20 



PERCENT ABOVE 
40 60 80 



U 
R 
A 

L 

U 
R 
B 
A 




100 80 60 40 20 20 40 60 80 
NEGRO FAMILIES EZZ WHITE FAMILIES 



100 



GRAPHS 317 

a correct cross-comparison of the proportionate distribution of parts. 
If the former, the original units must be shown on the scale and the 
bars will be of varying total lengths. If the latter, as in the illustration, 
the scale will be in per cents and, if all the parts are included in the 
graph, all bars will have the same total length, 100 per cent. When 
part-to-part comparisons are wanted in a single distribution the total 
bar may be cut into parts, each one starting from the zero base, instead 
of being arranged consecutively in one divided bar. This method is 
illustrated in Figure 41-C, which is a rearrangement of one of the 
divided bars in Figure 41-B. When the parts of only one total are 
shown, as in this case, the graph would present a better appearance 
if all the bars were solid black. However, if all of the bars of B were 
reproduced in the form of C, cross-hatching would be necessary and 
the entire graph would resemble the groups of bars in Figure 41-A. 

Duo-directional bars: In this type of linear graph, the single scale 
extends in both directions from zero. It has two main uses: (1) to 
show percentage change among a set of comparable items some of 
which may have increased while others decreased, and (2) to com- 
pare values classified according to two contrasting attributes, such as 
male and female, Republican and Democrat. The example in Figure 
41-D 8 is a modification of this second usage and also of the types 
shown in 41-A, B, and C. The contrasting attributes are incomes above 
or below a certain standard, $750 per year. There are two groups of 
bars, urban and rural, each group containing a bar for white and a 
bar for Negro families. Each of these four bars is actually a divided 
bar, that is, its total length represents 100 per cent, divided into the 
percentage of families having an income above $750 per year and 
the percentage below $750. The bars are aligned at this dividing point 
as zero instead of showing their total length measured from a common 
base. The method is very effective in emphasizing the particular com- 
parison that is wanted in this case. Figure 76, page 548, is an 
example of a duo-directional bar diagram showing increases and 
decreases. 

Essential features of bar diagrams: Unbroken Scale: A bar graph 
gives an accurate impression only when the relative lengths of the 
bars are correctly represented. It follows that the scale in every case 
must start at zero and continue as high as the highest value to be 

8 The original form of this graph was a pictogram, in the Consumers' Guide, United 
States Department of Agriculture, September, 1938. 



318 BUSINESS STATISTICS 

shown on the graph. If it starts at some point beyond zero, or if any 
break is made in the scale, all of the bars become shortened by equal 
instead of proportional amounts, and consequently a misleading pic- 
ture is given of the relative total lengths of the various bars. Some 
statisticians consider that any labeling or figures at the far end of 
each bar, or on the bars, also interferes with the estimate of their 
lengths. However, if the figures are inserted near the zero end of the 
bars, there can be little objection. In the case of horizontal bars, such 
inserted figures appear as a column, taking the place of an accom- 
panying table. 

7 Shading, and Spaces between Bars: Bars are most effective when 
solid black. White narrowly outlined in black is least effective since, 
unlike sections of a map or circle, every bar is entirely surrounded by 
a white background. Adjacent bars must be separated by white spaces, 
usually slightly narrower than the bars themselves. If they were not 
separated an effect of area rather than length would result, and it 
would be hard to estimate correctly the lengths of individual bars. 
Somewhat wider spaces are used to separate groups of bars, or each 
group may be boxed separately in a complete border, as in Figure 41-A. 
Scale and Labels: In the simple types of bars thus far described 
there is only one scale but this scale should always be shown. There 
is no fixed rule as to whether the bars should extend horizontally or 
vertically. If they are vertical, the scale is often repeated on the right 
side as well as on the left, and if horizontal it may be either at the top 
or bottom, or both. The horizontal position is usually more con- 
venient for graphs containing only a few bars. It is possible then to 
print the label of each bar at the left of the vertical base line, with 
the numerical value also if desired, and sufficient space can be allowed 
in a natural horizontal position for all labels, figures, and scale values. 
However, when the bars represent a time classification, even though 
not constituting a time series, it is customary to draw them in a vertical 
position/ as illustrated in Figure 41-A to correspond with the accepted 
form for time series bars, which will be explained in the next chapter. 

INTRODUCTION TO TWO-DIMENSIONAL LINEAR GRAPHS 

Every graph drawn on a plane surface has, of course, two dimen- 
sions. In this text, however, the simple types of linear graphs that have 
been discussed in the preceding section are distinguished from the 



GRAPHS 319 

more complex types in which there are two scales of values, one ex- 
tending horizontally and the other vertically. 

Definitions 

The term "two-dimensional" 9 will be applied to the latter type of 
graphs. Whenever the data consist of series of two or more variables, 
a two-dimensional linear graph must be employed. Before describing 
the principles and methods of constructing this kind of graph, some 
definition of terms becomes necessary. 

Variables. This word has occasionally been used in preceding 
chapters, without definition. According to Day, 10 "A variable is any- 
thing which exhibits differences of magnitude or number." It is used 
to refer to any column or row of data indicating changes in number or 
value of the particular unit named in the heading, e.g., in Table 15, 
page 149, the price of wheat is a variable. An ordered classification 
is also considered a variable. That is, classifications by size groups, time 
classifications according to regular periods or intervals, and even quali- 
tative attributes that can be arranged in numerical order, such as age 
groups, are all variables. In Part A of the wheat-price table, the stub 
classification "time in weeks" is therefore a variable; the classifications 
in Parts B and C are not variables, however, since neither "grades of 
wheat" nor "market" depend upon differences in magnitude or number. 

Statistical Series; Dependent and Independent Variables. When 
two such variables are shown in relation to one another the resulting 
table of data is a statistical series. Thus there may be series in time or 
series according to attribute. 11 

Variables in a series are further defined as "dependent" and "inde- 
pendent," the unit usually being the dependent variable and the 
classification the independent variable; e.g., the price of wheat is the 
dependent variable and time in weeks is the independent variable. 
Under some conditions both variables could be considered independent, 
in which case either one could be classified in terms of the other. For 
example, instead of quoting prices according to the time classification, 



9 Not to be confused with "duo-directional" in which a single scale extends both 
positively and negatively from zero; nor with "double or multiple scale," a term to be 
introduced later referring to two or more scales of units both measured vertically. 

10 Edmund E. Day, Statistical Analysis (New York, 1925): The Macmillan Co., 
p. 10. 

11 Series in space are also suggested by Day (ibid., p. 45), but this possibility seems 
not wholly consistent with his definition of a variable, since no geographic classification 
"exhibits differences in magnitude or number." 



320 BUSINESS STATISTICS 

weeks, a frequency distribution of the same data might use price range 
as the stub classification, the unit being the number of weeks in which 
each price was quoted: 

PRICK RANG* NUMBER OF WEEKS 

$.60-1.649 3 

.65- .699 10 

.70- .749 22 

etc. 

In the discussion of classification in chapter VIII it was stated that 
from the point of view of tabular arrangement no distinction need be 
made between series and other kinds of classification. In graphic 
development, however, the treatment for a series of two variables, 
whose relations to one another are of primary importance, is quite 
different from the methods that have been described in this chapter 
for a single variable classified non-numerically. 

Two-Dimensional Scales. In order to measure graphically the 
values of the two variables in a series, two scales are required on two 
axes that intersect at right angles. The vertical axis corresponds to 
the single scale described in the preceding section on linear graphs and 
is used to measure the number of units in the dependent variable, that 
is, the numerical values of the data themselves. In other words, it 
represents the figures that appear in the rows and columns of a table. 
The horizontal scale in similar fashion measures the independent 
variable, that is, the numerical values of the classification. 12 

Ordinarily such a graph uses only one quarter of the field covered 
by the intersecting axes, that is, the "positive" field above and to the 
right of the point of intersection. If negative values are necessary on 
the vertical scale, the field below the intersection must also be shown, 
and similarly the left-hand field for negative values on the horizontal 
scale. For use in formulas and diagrams, the dependent variable is 
referred to as Y and the independent variable as X. The Y variable 
is called a function of the X variable. The student will do well to 
familiarize himself with all these uses of the term "variable" because 
they become increasingly important in advanced statistical analysis. 

Types of Statistical Series and Their Graphs 

When the classification indicates quantitative attributes, the units in 
the dependent variable are called "frequencies." Graphic methods for 

12 Certain exceptions to this rule will be found in some special purpose graphs, such 
as price curves in the field of economics. 



GRAPHS 321 

illustrating analysis of this kind of series will be described in chapter 
XV. Series in which two quantitative attribute classifications are re- 
lated to one another lead to analysis by correlation, the subject of 
chapter XXVII. The principles of time series and the methods for 
representing them graphically are discussed in the first half of 
chapter XIV. 

In the majority of texts it is customary to reserve the use of the 
word "series" for time series, and to refer to other kinds of series as 
"groups." Consequently that practice will be adhered to in subsequent 
chapters of this book. The general term "series" was introduced in 
this section to point out a basic similarity in all data involving func- 
tional relationships between two variables. A two-dimensional scale is 
necessary for graphic representation of any such relationships. 

PROBLEMS 

1. What is the principal advantage of a graph as contrasted with a table for 
presenting information ? 

2. State briefly the relations shown in each part of Figure 35, page 299. 

3. Find in print a graph of one of the types presented in Figure 35. Analyze 
on the basis of our text the steps followed by the author of the graph in 
its preparation. 

4. Select from any issue of the Statistical Abstract tables which should be 
represented by each of the four types of statistical maps described in the 
text. Explain why each set of data could be represented best by the type of 
map you have selected. 

5. a) Present the information given below in the form of circles and sectors. 
b) Discuss the difference in use of fuels by iron and steel, and by all 

industries. 

COST OF INDUSTRIAL CONSUMPTION OF PURCHASED ELECTRICITY AND OTHER TYPES 
OF FUEL, FOR ALL INDUSTRIES, AND FOR IRON AND STEEL, 1929* 



TYPE OF FUEL 


Co 
(MILLIONS < 


ST 

>F DOLLARS) 




ALL INDUSTRIES 


IRON AND STEEL 


Total 


1,973.9 


463.1 


Purchased electricity 


719.5 


128.2 


Bituminous coal 


754.5 


87.2 


Anthracite coal 


43.6 


2.5 


Coke 


243.7 


198.2 


Fuel oils 


212.6 


47.0 









* 1930 Census of Manufactures, Vol. 1. p. 161. 



\22 



BUSINESS STATISTICS 



6. Given the following information concerning the age distribution of all 
persons over 10 years of age in the United States and those gainfully 
employed in each group: 





MILLION! 


1900 


1920 


1930 


Total population 10 years of age and over. 
Total gainfully employed 


57.9 
29.1 

9.6 

1.7 

34.7 
20.2 

10.4 
5.8 

3.1 
1.2 


82.7 
41.6 

12.5 
1.1 

48.1 
28.9 

17.0 
9.9 

4.9 
1.7 


98.7 
488 

14.3 
.7 

56.3 
33.5 

21.4 
12.4 

6.6 
2.2 


Total population 10-13 years of age 
Total gainfully employed 


Total population 1644 years of age 


Total gainfully employee! 


Total population 4564 years of age 


Total gainfully employed 


Total population 65 years of age and over. . 
Total gainfully employed 



a) Study the changes in the age composition of the employed population 
during the 30-year period, and draw a graph to illustrate your con- 
clusions. 

b) Study the changes in the percentage of each group gainfully occupied, 
and draw a graph to illustrate these changes. 

c ) Write an interpretation of the data illustrated by your graphs. 
7. a) Present the following information graphically. 

b) Discuss the nature of changes in this business during the ten-year period. 

SALES IN A COUNTRY GENERAL STORE ACCORDING TO 
TYPE OF GOODS, 1930 AND 1940 



TYPE OF GOODS 


1930 


1940 


Groceries 


$19,650 


$21,410 


Meats 


400 


975 


Shoes 


1 125 


630 


Rubber footwear 


760 


925 


Dry goods 


1,650 


310 


Notions 


1,850 


1,025 


Hardware 


925 


1,070 


Drugs 


425 


115 



REFERENCES 
(See page 348) 



CHAPTER XIV 



GRAPHS Continued 

TIME SERIES GRAPHS 

THE passage of time is most naturally pictured by a moving point 
whose apparent course may be traced from left to right. This 
basic idea is utilized by all types of time series graphs. 
Bar Graphs 

A time series may be represented by a row of bars, the height of 
each bar representing a single value, exactly as in the one-dimensional 
graphs. However, the order of arrangement no longer depends on 
judgment or arbitrary choice; the bars must stand at evenly spaced 
intervals along the base scale, the units of which represent successive 
equal periods of time. The fluctuations of the dependent variable may 
be followed through the path marked by the tops of the bars. This is 
illustrated in Figure 42, which shows the changes in value of United 
States exports annually, 1922-39. 

FIGURE 42 

BAR GRAPH OF TIME SERIES 
VALUE OF UNITED STATES EXPORTS, 1922-39 

BILLIONS OF DOLLARS 



50 
45 
4O 
35 
30 
25 
20 
15 
10 
.5 




I III 



ii i HUM 

LI i ii ii ii 



I I I Mill 

. II lilt I I 



1922 



1924 



1926 



1926 



1930 



323 



1932 1934 1936 1938 

Data from Statttttcal Abstract 



324 



BUSINESS STATISTICS 



Band, Strata, or Surface Graphs 

Bars that represent a time series may be divided into several 
components. A long row of divided bars gives an effect of wavy hori- 
zontal bands or strata. To serve the same purpose, these several com- 
ponent parts of the variable are frequently shown as a continuous 
"surface" instead of in separate bars. The strata or bands in the 
surfaces are cross-hatched the same as divided bars, and show by con- 
trast the changing proportions of the parts of a total over a continuous 
period of time. 

This type of chart may be designed to show percentage distributions, 
in which case the graph consists of a rectangle completely filled in 
with bands of fluctuating width. If the scale represents actual values 
instead of per cents, the upper boundary of the surface will be irregular, 
representing the actual total at each period. The two types are illus- 
trated in Figure 43-A and 43-B, both of which show the same total 
data as Figure 42, divided into component parts. Whether per cents 
or actual values are shown, there is some danger that the bands may 
take on a distorted appearance due to sudden extreme fluctuations in 
some of the parts. The width of the band must be estimated by the 
vertical distance between its boundaries at 
each point. In the illustration at the right 
the width is actually the same throughout 
the period, but due to various angles of 
change in its lower boundary, it is pulled 
out of shape. If the data do not contain too 
many sudden changes, this distortion may 
be reduced to a minimum by charting at the 
bottom of the graph the narrowest band 
and others having but slight fluctuation, so 
that succeeding bands will have a lower 
boundary that is fairly level. The upper one, 
that which fluctuates the most, will then not 
affect the shapes of the other layers. 

The smoother strata can be located near 
the top and bottom in a 100 per cent graph. The jagged edges of the 
two most irregular strata will then fit together near the center, theii 
widths being measured from the straighter edge of each. In some cases, 
however, a required sequence of the component parts based on other 
considerations of the data will determine the order of the bands. 




FIGURE 43 
BAND GRAPHS OF TIME SERIES 

VALUE OF UNITED STATES EXPORTS ACCORDING TO MAJOR ECONOMIC CLASSES, 

1921-39 



A. CUMULATIVE PER CENTS 



PERCENT 
100 




1923 



1925 



1927 



1929 



1931 



1933 



1935 



1937 



1939 



B. CUMULATIVE DOLLAR VALUES 



BILLIONS OF DOLLARS 




1921 1923 1925 1927 1929 1931 1933 1935 1937 1939 

Data from Statistical Abstract 



326 BUSINESS STATISTICS 

A well-planned band, strata, or surface chart is a valuable means for 
showing the changing relations of the component parts to the whole 
in a time series. On the other hand, if the primary purpose is to 
place emphasis on time comparisons between individual parts, or 
between separate totals, a line graph is preferable. 

Line Graphs of Time Series 

Graphs of this type are unquestionably more widely used than any 
other in every phase of government and business statistics. They are 
constructed in exactly the same way as simple bar graphs of time 
series, except that, instead of drawing a bar for each time period, only 
the point at the upper end of each bar is plotted. 1 The successive 
points are then connected by straight lines whose combined length 
may have a more or less jagged appearance, depending on the irreg- 
ularity of the data. Regardless of its degree of smoothness, this con- 
tinuous line is called a "curve." The chief advantage in using curves 
instead of bars is that several curves may be shown for comparison 
on the same graph more easily than several sets of bars. 

Curves of time series may serve either of two purposes: to show 
actual amounts of change or to show relative changes. Whichever 
function is more important will determine whether the vertical scale 
should be arithmetic or logarithmic. 

Arithmetic Scale. An arithmetic scale in which equal spaces on 
the scale stand for equal amounts of the unit is familiar to everyone. 
It should be used for the vertical scale whenever a comparison is 
wanted between actual amounts of the unit either for a single series 
at different time periods or for several series at corresponding periods. 

Methods of comparing several series: A problem arises, however, in 
comparing two or more series that are recorded either in different units, 
or in the same unit at levels so far apart that it is difficult to use the 
same scale effectively for both. The purpose of such a graph must be 
carefully considered before choosing one of the alternative graphic 
methods for dealing with this situation. Figure 44, A, B, and C, shows 
three ways of handling the same data on arithmetic scales. 

1 In mathematical language, the positions assigned TO vaTues of the independent 
variable along the horizontal axis are called "abscissas" and the values of the dependent 
variable assigned along the vertical axis are called "ordinates." The plotting of any 
value of the dependent variable consists in (1) determining the position of the independent 
variable on the horizontal scale (abscissa) and the value of the data on the vertical scale 
(ordinate) ; (2) locating the point of intersection of a vertical at the abscissa position and 
a horizontal at the ordinate value. 



GRAPHS 327 

1. Single Unit Scale: If the primary purpose is to compare absolute 
amounts at each period, a single 2 unbroken arithmetic scale is best 
even though it minimizes the fluctuations of the series having the 
smaller values. In this case, if the purpose is to show that coal is still 
the major source of power, as compared with oil, Figure 44-A is the 
one to use. 

2. Index Numbers: If a comparison of relative changes of the vari- 
ables will serve the purpose, and particularly when two or more series 
do not have a common unit, they may be reduced to indexes, using a 
corresponding base 3 period in each case. The 100 per cent line and the 
entire per cent scale will be common to the several series. Figure 44-C 
shows the relatively greater percentage variation of oil as a source of 
power, 1906-10 being chosen as the base period. 

3. Scale Equation: If it is essential to depict on the graph the actual 
rather than the relative values, and at the same time to show on equal 
terms the degree of fluctuation in each series, some form of scale equa- 
tion must be utilized. Several methods are commonly used but the 
only one that is justified statistically 4 is the equation of the several 
series of values so that their respective averages for the period will 
approximately coincide on the graph. The method is as follows: (a) 
Find the arithmetic average of each series, in its own unit, for the entire 
period. (&) Find the ratio between the two averages. In this example, 
the averages were 8.7 quadrillion B.T.U. for coal consumption and 1.62 
quadrillion B.T.U. for oil, or roughly one unit for oil to five for coal, 
(r ) Separate vertical scales are drawn on either side 5 of the graph, both 
starting at zero. In Figure 44-B, coal is on the left and oil on the right 
(^/) The same space that represents five units on the left (coal) 
represents one unit on the right (oil), and the approximate averages 
of the two coincide. (<?) In arranging the equated scales, they should 
not be condensed so much that fluctuations are underemphasized, but 
the maximum values of both sets of data must be provided for. 

2 This method, of course, is available only when the two series are in the same 
unit or can be reduced to a common unit. Otherwise the comparison of absolute amounts 
on a single scale is out of the question. 

3 For the choice of base, and interpretation of the comparison, refer to the discussion 
of index numbers in chapter XIX, pages 483-86 and page 498. 

4 The suggestion has been made by some statisticians that the scales be equated on 
the basis of dispersion. However, the entire purpose of scale equation is to compare 
the degree of fluctuation between two series of data, and if this dispersion is equalized 
there appears to be no useful comparison shown by the graph. 

6 If three series must be equated it is necessary to have three separate scales, each 
properly labeled. Customarily two of the scales are placed at the left and one at the right, 
but occasionally all three will be found at the left With more than three series the graph 
becomes too involved to read easily 



328 



BUSINESS STATISTICS 



FIGURE 44 
LINE GRAPHS OF TIME SERIES 

SUPPLY OF POWER FROM COAL AND DOMESTIC OIL 
ANNUAL AVERAGES FOR FIVE-YEAR PERIODS, 1871-1935 

(All figures represent equivalent British Thermal Units, in quadrillions) 



BTU 



BTU 



ABSOLUTE AMOUNTS -ARITHMETIC SCALE 




1871-75 '76-'80 '81-'85 86-'90 91-'95 '96-1900 '01-'05 '06-10 '11-15 '16-20 '21-'25 '26-30 '31-'35 
COAL OIL 



TWO ARITHMETIC SCALES 
EQUATED TO ANNUAL AVERAGES 
OF 13 FIVE-YEAR PERIODS 




75 '76-80 '81-'85 '86-'90 '91-95 '96-1900 '01-'05 t>6 -10 '11-15 '16-20 '21-25 '26-30 '31-35 



GRAPHS 



329 



FIGURE 44 (continued) 
LINE GRAPHS OF TIME SERIES 



INDEX 
1600 



C. INDEX NUMBERS ON 1906-10 BASE 




OIL/ 



COAL 




400 



1871-75 76-'80 '81--85 '86--90 91-95 '96-1900 '01-'05 'Ofi-lO '11-15 '16-20 '21-'25 26-30 '31-35 



SEMI -LOGARITHMIC CHART 

IN QUADRILLIONS) 




1871-75 76-'80 '81-'85 '86--90 *91-'95 '96-1900 '01-t>5 '06-10 



'11-'15 '16-'20 '21--25 '26-30 31-35 

Data from Statistical Abstract 



As a result of this method, each variable is given equal emphasis 
in terms of fluctuations from its own average, and they may be com- 
pared accordingly. For example, Figure 44-B shows that after 1916-20 
oil-produced power expanded sharply above its average level, while in 
the same period coal-produced power fluctuated mildly with a slight 
tendency to decline toward its average level. The importance of the 



330 BUSINESS STATISTICS 

increase in oil-produced power during this period is completely con- 
cealed in Figure 44-A. Specifically the increases in coal-produced and 
oil-produced power appear to be equally important in 44-A but 44-B 
shows the actual relation in the growth of use of the two fuels. 6 

It should be noted that if index numbers are computed on an average 
of the period as a base, the shapes and relative locations of their curves 
will be exactly the same as by this more cumbersome method of scale 
equation. The only advantage of scale equation over index numbers 
is that actual amounts instead of percentages can be read from the 
graph. 7 

4. Need for Logarithmic Scale: A comparison of relative rates of 
change, or relative increases or decreases from one period to another 
is likely to be more important than any of the three purposes named 
above. Such comparisons cannot be shown satisfactorily on an arithmetic 
scale, even by means of index numbers. Changes in the latter must 
always be studied with reference to a certain base period and cannot 
be shown equally between any two periods or at various levels on 
the scale. This is because equal amounts, or spaces, on an arithmetic 
scale represent constantly decreasing percentage changes as the values 
of the scale increase. For example, in Figure 44-C the index of coal 
consumption increased from 14 to 24 from 1876-80 to 1881-85, a 
relative increase of over 70 per cent, but the index rose only ten spaces 
on the scale; on the other hand, from 1921-25 to 1926-30 the oil con- 
sumption index rose from 375 to 518, an almost identical percentage 
increase (74 per cent), but 143 spaces on the scale were required to 
show it. In order to give an accurate visual conception of these relative 
rates of change, the arithmetic scale must be abandoned in favor of the 
logarithmic. This fourth method is shown for the oil-coal data in 
Figure 44-D and will be explained later. 

Breaks in time series scales: Vertical Scale: Strict accuracy re- 
quires that there should be no break in the arithmetic scale of a time 
series line graph any more than in the case of bars. However, it 
is often just as important to study the fluctuations in the variables 
as to compare the actual total values between the points of the curves. 
The lowest value of any of the variables may amount to several million 

6 Students will be able to make this type of interpretation in greater detail after 
studying measures of dispersion in chapter XVIII. 

T Note Figure 78, page 555, in which two scales are equated to the values at the 
first period instead of to the respective averages. In this case the purpose of the graph 
is to show how the two series diverge. 



GRAPHS 331 

units, and a scale covering a complete range between zero and the 
highest value to be graphed would become so small that a change of 
even several thousand units would cause no perceptible movement in 
the curve. Common practice, therefore, permits a "break" in the verti- 
cal scale below the lowest point needed for any of the points plotted. 
Zero is indicated as the base, then a double jagged line is drawn 
(using finer lines than the curves on the graph) and the scale may be 
resumed above the break at any value required. The break represents 
an actual tear in the paper, hence Hie vertical scale line and all grid 
lines are left blank between its boundaries. (See Figures 73- A, 74 and 
75, pages 540, 543, and 546, respectively.) 

Likewise when the values of the series are in the form of index 
numbers, if the vertical scale is incomplete, a more careful study 
becomes necessary in order to estimate correctly the percentage of 
fluctuation. However, in order to enlarge the per cent scale, zero is 
frequently omitted altogether, the scale being extended below 100 
only as far as the data require. In any case, the 100 per cent or normal 
line should be emphasized, since it is just as important a standard 
as zero. 

Horizontal Scale: So much emphasis has been placed on the fact 
that the regular intervals of a time series graph accurately depict the 
even progress of time that any suggestion of tampering with this scale 
may well be questioned. There is no abrogation of the principle that 
equal spaces on any arithmetic scale should always represent equal 
values. However, just as under some circumstances a break may be 
permitted in the vertical scale, there are also situations which justify 
changes rather than breaks in the time scale. 

In many business charts the main interest is in the presentation of 
current monthly or weekly data. Comparison with the past is a matter 
of only secondary importance, and it has therefore become customary 
to enlarge the scale for the current year and to contract the scale 
for previous years. If space is limited, and it is desirable to show some 
information regarding changes in the past for as long a period as 
possible, there are several methods for representing the earlier years 
in condensed form. Some possible alternatives are: (1) to show on a 
small scale monthly changes for several previous years; (2) to repre- 
sent only the annual averages by a single point in each year of the 
earlier period; (3) to use a series of vertical bars each of which indi- 
cates the range of fluctuation within a single year. Any of these meth- 



332 BUSINESS STATISTICS 

ods gives a more complete and continuous story than can be gained 
from a chart in which there are either no data at all regarding previous 
periods, or else there is a complete break of several years with no 
indication of what happened in between. For the business man who 
has become accustomed to these forms there is no misrepresentation 
of facts. He has in compact form just the information he wants, and 
is well aware that the current year is a sort of "slow motion picture" 
in comparison with the previous scale. However, these procedures are 
not recommended for the student as methods for general interpretation. 
When they are used, the chart should be divided vertically into several 
segments separated by narrow white spaces to indicate the points at 
which the base scale has been changed. Figure 71, page 531, illus- 
trates two changes in the time scale. The main interest is in the weekly 
data shown on a large scale for 1940 and 1941; the preceding four 
years of monthly data are shown on a smaller scale; and for the 
earlier years the bars show the annual range and the average for 
each year. 

Logarithmic Scale. The logarithmic* or ratio scale is widely used 
as a graphic device because it permits equal spaces to represent equal 
percentage changes at any point on the vertical scale of a time series. 
The space between 100 and 200 is the same as between 50 and 100, 
20 and 40, 6 and 12, 1.5 and 3, .005 and .01, and so on. 

Explanation of principle: Figure 45 illustrates the method by which 
the spacing on the logarithmic scale is determined. On the left (A) is 
an ordinary arithmetic scale, from to 2; in the center (B) is a 
column of logarithms, whose characteristics range from to 2, marked 
off at points measured according to the arithmetic scale (A) ; on the 
right (C) is a column of natural numbers which are the anti-logs of 
the logarithms opposite them on scale (B). These natural numbers in 
(C) are therefore spaced according to what is known as the logarithmic 
or ratio scale. 

The advantage of using scale (C) is based on the rule for multi- 
plication by means of logarithms: when two numbers are to be multi- 
plied together, their logarithms can be added and the sum will be the 
logarithm of the product of the two numbers. In Figure 45 the space 
marked a c on the center scale stands for the logarithm of 6; a b 
stands for the logarithm of 2. Hence if we wish to multiply 6 by 2 

See Appendix C for explanation of logarithms, rules for their use, and table of 
logarithms. 



ARITHMETIC SCALE USED 

IN SPACING LOGARITHMS 

OF SCALE B 

A 
20 



FIGURE 45 

CONSTRUCTION OF THE RATIO SCALE 

LOGARITHMS 

MEASURED 
BY SCALE A 

B 

3000i 



1954 



-I 903 
I 845 
"l 778 

1.7 f699 

1.6 

1.5 

14 L I3g8 

13 1 30, 

^ ^ ^ 
I | ( Q7g 

10 

.954| 

.9 

845 
.8 ' 

7 I .699 

602 

,477 
4 j. <398 

3 L 301 

.176 

000 



NATURAL NUMBERS 

ANT I LOGS 
OF SCALE B 

C 

loo 



90 
80 
70 
60 

50 
40 

30 
25 

20 

15 

12 

10 
9 
8 
7 
6 

5 
4 

3 
2.5 



1.5 



334 BUSINESS STATISTICS 

(that is, to double it or increase by 100 per cent) we can add a space 
c d, equal to a b, to the space a c, and we should arrive at a d, 
the logarithm of 12. This is precisely what does occur, as can be veri- 
fied by measuring with a ruler. Likewise the space from log 4 to log 8 
can be measured and proved equal to the same space a b, as is also 
log 75 to log 150, log 300 to log 600, etc. In other words, adding the 
space a b, representing log 2, to any other value at any point on 
the scale (C) will multiply that value by 2, or increase it by 100 
per cent. 

Now take the space on scale (C) measured by log 3. Added to 
itself, we reach log 9; added to log 10 we reach log 30; etc. In each 
case the original value of the anti-log is multiplied by 3 or increased 
by 200 per cent. 

The space measured on scale (C) by 1 to 10, or 10 to 100, is called 
a cycle. Every point in a cycle is ten times the value of the corres- 
ponding point in the cycle just below it, or represents 900 per cent 
increase. 

Percentages of decrease follow the logarithmic rule of division: the 
quotient is the anti-log of the difference between the logarithms of two 
numbers. Just as in the case of any percentage change computation, 
per cents of increase or decrease between two given points have dif- 
ferent bases and must be read differently. That is, log 50 plus log 2 == 
log 100, an increase of 100 per cent. But log 2 subtracted from log 
100 = log 50; 50 is \ of 100, a decrease of 50 per cent. Similarly 
log 3 subtracted from log 90 = log 30; 30 is 4 of 90, a decrease 
of 66$ per cent. And log 50 subtracted from log 200 = log 4; 4 is 
J of 200, a decrease of 98 per cent. 

In the portion of logarithmic scale illustrated in Figure 45, only two 
cycles are shown, from 1 to 10 and 10 to 100. This scale can be 
extended upward, of course, to 1,000, 10,000, etc., and it may also 
be extended downward indefinitely to .1, .01, .001, .0001, but never 
can reach zero. 9 There is therefore no zero base on this scale, nor 
any other fixed point from which heights are measured. Hence only 
the portion of the scale that is used in plotting need be shown in the 
graph of a given series. Relative changes are measured by the distance 
between any two points on the vertical scale. Any alteration in the 



9 Consequently the logarithmic scale cannot be used for a series that includes zero 
or negative values 



GRAPHS 333 

slope of a curve, therefore, indicates a changing relative rate of change 10 
in the data. A curve that follows a straight line upward is increasing 
at a constant relative rate. The possibilities of changing relative rates 
of increase in a curve, and corresponding relative rates of decrease, are 
illustrated in Figure 46. 

a) If it is convex upward, it is increasing at an increasing relative 
rate. 

b) If it is concave upward, it is increasing at a decreasing relative 
rate. 

c ) If it is concave downward, it is decreasing at an increasing rel- 
ative rate. 

d) If it is convex downward, it is decreasing at a decreasing relative 
rate. 

Because there is no fixed base line, scale equation between differing 
units presents no great problem. One unit can be shown in tens on 
one side of the scale, and another unit in thousands on the other side. 
The scale values can be adjusted at will in order to bring the curves 
to the relative positions that afford the most effective comparison of 
their slopes at various periods, provided the original ratio relationship 
set up by the logarithmic scale is not tampered with. This means that 
every value may be multiplied by the same number throughout, 
changing the cycles, for example, from 1-10-100 to 3-30-300, or 4-40- 
400. Each cycle is still ten times the value of the cycle below it, and the 
intervening spaces also keep their original ratio values. It is sometimes 
convenient to make this adjustment by multiplying in order to bring 
the curves closer together or to bring all the values within one less 
cycle. For example, the series 8, 20, 36, 47, 80, 200 would require 
the use of three cycles 1-10-100-1,000. But if the scale is multiplied 
by any factor from 2 to 8, the series will fall within two cycles, 2-20-200 
or 8-80-800. 

Example and interpretation: The advantage of the semi-logarithmic 
time series graph, that is, one in which the time scale is arithmetic 
and the vertical scale logarithmic, can be illustrated by a study of 



10 When we speak of rates of change on a logarithmic scale, those rates are expressed 
in per cents They are really relative rates of change and the fact that they are expressed 
as per cents is usually taken to imply relative rates without employing the cumbersome 
terminology. However, we shall use "relative rates of change" in this text to keep the 
student constantly reminded that the rates of change are expressed in per cents. 

11 Note that if the original values are moved up on the scale, as moving 1 to 2, 
2 to 3, etc., or if an equal amount is added to each original value, the true relationship 
of the logarithmic values will be entirely distorted 



336 



BUSINESS STATISTICS 



100 



FIGURE 46 
CURVES SHOWING CHANGING RELATIVE RATES ON A RATIO SCALE 



RATES OF INCREASE 



RATES OF DECREASE 



100 




'23 '24 



'31 '32 J 33 '34 



Figure 44-D. After looking at the three preceding graphs of these 
data, drawn on three different arithmetic scales, one scarcely knows 
which fuel has increased in use the more rapidly. The rates of change 
could be computed from A or B but certainly they are not readily 
apparent on either graph. From Figure 44-C which shows the index 
numbers, the percentages of change can be compared as related to a 
certain base period. This indicates that the use of oil has increased 
faster than that of coal, but if some earlier period had been taken as 
the base the relative increase in the use of oil would have been greatly 
exaggerated. With the logarithmic scale, however, the difference in 
the slopes of the two curves is apparent at a glance. 

From the first to the second five-year average we know that oil- 
produced power increased at a greater rate than coal-produced power 
because its curve slants upward more sharply. For the second five-year 
interval the use of coal increased a little faster than that of oil, but 
during the next period the two curves are practically parallel, hence the 
two were increasing at an equal relative rate. From this time on until 
1926-30 the use of oil increased more rapidly during every period 



GRAPHS 337 

except 1891-95 to 1896-1900. During the final period the consumption 
of both fuels declined, but coal more sharply than oil. 

Coal-produced power increased at an almost constant relative rate 
from 1881-85 to 1906-10, then it leveled off slightly for two periods, 
and finally turned downward. Its subsequent course has been a decrease 
at an increasing relative rate except for a minor recovery in 1926-30. 
Oil-produced power likewise increased at a constant relative rate 
from 1896-1900 to 1906-10; the line then leveled off, indicating 
a smaller relative rate of increase for the next two periods, after 
which it turned up and resumed its previous course for one more five- 
year period. This periodic rise in the curve can be measured on the 
scale and found to be equivalent to the distance between 1.0 and 1.7, 
or a 70 per cent increase. Computation either from the original data, 
Figure 44-A, or from the index numbers, 44-C, will prove this to 
be approximately correct for the three periods, 1896-1900 to 1901-5, 
1901-5 to 1906-10, and 1916-20 to 1921-25. 

It scarcely needs to be added that since the slopes of the curves 
are so significant in this type of graph, neither of the scales can be 
tampered with in any way. Any omission of intervals or change in 
the time scale would entirely distort the slopes of the curves. 

Methods of making a ratio scale: Semi-logarithmic graph paper can 
be purchased in any needed number of cycles and with almost any 
arrangement of the base scale in time intervals. However, in order 
to use the ratio scale confidently, the student should understand how 
it is made and should be able to make his own scale if necessary. It 
can most easily be marked off with a slide rule if one is available. 
If a table of logarithms is at hand the simplest procedure is not to 
draw a complete ratio scale, but to plot the logarithms of each point 
on an arithmetic scale, just as scale (B) was plotted according to 
scale (A) in Figure 45. The plotted points will be equivalent to the 
anti-logs (natural numbers) plotted on a ratio scale, just as scale (C) 
was equivalent to scale (B). If neither of these aids is to be had, 
correct proportions may be obtained by plotting a geometric series 12 at 
evenly spaced intervals on the vertical scale using any starting point 
(/) and any common multiplier (f). Thus if t\ = 1 and r = 2, 
a scale of 1, 2, 4, 8, 16, 32, 64, 128, 256, etc., could be used for 



12 In a geometric series the ratio of any term to the preceding term is constant. If /i 
denotes the first term, n the number of terms and r the common ratio, the series may be 
written, 



338 BUSINESS STATISTICS 

plotting. Although the scale is accurate, the plotted curve may be some- 
what approximate because it is hard to determine exact values on 
such a scale. 

PLANNING GRAPHS FOR GENERAL EFFECT 

After having selected the kind of graph he intends to use to illus- 
trate his point, the statistician is ready to block out his actual plan 
for drawing. His degree of success at this point will be in direct 
proportion to his ability to combine artistic principles and technical 
skill with statistical acumen. 

Artistic Considerations 

Fortunately, no artistic genius is required in order to create an 
artistically effective graph. It is necessary only to understand the 
simplest rudiments that serve as guides to practically every form of 
artistic expression size, proportion, balance, and contrast. 

Size. The size will depend primarily on how the graph is to be 
used: is it to be published, or used for lecture purposes? A wall 
chart must be large and clear enough to be seen from any point in 
the room or auditorium. There is no use in preparing such a chart 
if it is too small. The lighting conditions under which it is to be shown 
must also be taken into account. 

If the graph is to be printed the size of the page will determine 
its final dimensions, but the original may be drawn from l to 3 or 
4 times larger. Less meticulous care will be needed to draw it on a 
scale larger than its final form since small imperfections will dis- 
appear in photographic reduction. 

The amount of detail included is also a factor in determining 
the size of a graph. If only a single important relationship or a gen- 
eral condition is to be emphasized, one-half or even one-quarter of a 
page will suffice; whereas a more important or comprehensive graph 
will require a full page. If the graph includes a great deal of complex 
information, a variety of different kinds of lines, and detailed scales 
and legends, it may be necessary to use a folded insert even larger than 
the page of the book. 

Students who use prepared graph paper of standard size should 
remember that it is not necessary always to make a graph that will 
cover the entire sheet. Each graph can be made of suitable size and 
proportion by inclosing a part of the page within a bordei. 



GRAPHS 339 

Proportion. The exact relation between the length and width of a 

graph is determined to some extent by the data that are being pre- 
sented, but in general there is a range within which a pleasing effect 
may be attained. If a graph is too long and narrow, either horizontally 
or vertically, it has an awkward, stretched-out appearance. Square 
graphs present a monotonous appearance and do not fit the page 
conveniently. The proportions will be within a pleasing range if the 
length is somewhere between 1J and Ij times the width. 18 Prob- 
ably the most convenient standard to use in preparing material for 
publication is known as "root two" or the ratio of 1.414 to 1. The 
long side is equal to the diagonal of a square drawn on the short side, 
and consequently if the rectangle is divided in half the resulting 
rectangles have the same proportions as the original, i.e., 1 to .707. 
A graph that is drawn with these proportions may occupy a whole page 
turned the long way, or reduced to half size will fit across half of 
the same space in normal position. 

Balance. The term "balance" as applied to a graph has the same 
meaning as in any other kind of picture. It is a term borrowed from 
physics to indicate that there is an approximately equal stress on either 
side of a central point. The statistician is not at liberty to select his 
data so that, for example, the peaks and troughs of his curves will 
balance artistically. He must therefore depend upon his auxiliary 
material if necessary to offset the appearance of an unbalanced set of 
data. To a certain degree he can enlarge one scale and reduce the 
other in order to alter the shape of his curve, although discretion must 
be used to avoid an exaggerated effect of fluctuation. If he has a set 
of bars that nearly fill the entire space, he will make them slender 
enough so that they will not bulk too large and heavy. If in spite 
of every effort a good fourth or more of the surface remains blank, he 
might print his title or key in that section, or insert a small table of the 
data. (See Figure 44- A and C.) Note, however, that // is never per- 
missible to insert printed material between any significant portion of 
the graph and its accompanying scale. Instead of using a key, if legends 
are printed close to each curve, it is usually possible to distribute them 
in clear spaces on the graph instead of bunching them all at the top 
or at one side. The addition of a border is a great aid in tying all 
parts of a graph together into a well-balanced whole, and is partic- 



18 "Length" refers to the longer dimension which is most frequently the horizontal 

measurement; "width" is the shorter dimension, usually the height. 



340 BUSINESS STATISTICS 

ularly necessary for maps, circles, and simple bars. (See Figures 35 
and 36, pages 299-305.) Many graphs will be found in print that 
have no borders except for the page margins. These graphs are usually 
of the two-dimensional type, such as Figures 42 and 43, in which 
the limits of the grid itself bounded by the horizontal and vertical 
axes with the title of the graph printed above practically take the place 
of a border in marking out the definite space occupied by the graph. 

Contrast. Boldness is the secret of effective graphic presentation. 
Since the sharpest contrast can be achieved by the use of black and 
white, this combination is most commonly used in statistical graphs. 
Other reasons for preferring black and white are: (a) Graphs in color 
are much more expensive to reproduce, and some colors cannot be 
used in ordinary photostating. () Colors cannot be arranged in vary- 
ing degrees of intensity as unmistakably as can the standard types of 
black and white cross-hatching, (r) All readers do not evaluate colors 
in the same way, and some may even be color blind. 

Appropriate types of cross-hatching for various purposes have already 
been discussed and illustrated in chapter XIII by the circle in Figure 
35-B, the map in Figure 37, the bars in Figure 41, and in this chap- 
ter by the strata charts, Figure 43. To sum up the general rule 
again: when cross-hatching is used to represent quantitative informa- 
tion, increasing magnitudes must be indicated by increasingly intense 
or dark types; if it is used only to distinguish one set of data from 
another according to some non-quantitative characteristic, any kinds 
of cross-hatching may be chosen that will afford the greatest possible 
contrast, usually by alternating light and dark types. It is possible to 
buy gummed paper printed in a great variety of cross-hatched patterns. 
This may be applied to the graph and trimmed to the desired shapes 
with great saving of time. 

Contrast may also be achieved by differentiation in types of lines 
when several curves are being presented. A number of possible types 
are shown in Figure 47. The most important data, such as a combined 
index, can most effectively be represented by a solid black line, and 
the lines for the other curves should be selected so that they are easily 
distinguishable from one another and so that all are equally distinct. 
The curves representing the data should be heavier than the back- 
ground lines on the graph. The usual order of these background lines 
is: border heaviest, followed by vertical and base scales and 100 per 
cent line, with other grid lines lightest. 



GRAPHS 341 

FIGURE 47 
TYPES OF LINES 



When symbols are employed in a pictogram, boldness should be 
the primary consideration. One or two kinds of solid black symbols 
of standard shape, whose meaning cannot possibly be misunderstood 
are much more effective than a variety of outline sketches that cannot be 
interpreted without reference to some printed material. 

The printing on a graph should be heavy enough to contribute to 
the effect of contrast, but not so heavy as to detract from the diagram 
itself. Vertical capital letters and figures with no ornamentation what- 
ever are most suitable for this purpose, and are most easily read. The 
heaviest lettering will be used for the title, a smaller size for the 
legends, scales, etc., and probably the smallest of all for the reference 
to the source or other notes of explanation. It should go without saying 
that neatness in lettering is one of the most essential features of an 
artistic graph. 

Technical Details 

Certain techniques in graphic construction have come to be accepted 
as standard. These details should be observed, not because they are 
fixed by an arbitrary set of rules but because they are all founded 
upon the principle that graphs must give a clear idea with the min- 
imum of effort on the part of the reader. 

Title. The title of a graph must meet the same requirements that 
were established for the title of a table. 14 It should not give the con- 
clusion to be drawn from the graph, as "Sales Larger This Year Than 
Last"; nor the method of analysis, as "Frequency Distribution of 
Number of Employees. 15 Information such as the units of measure, 

14 Chapter VIII, pp. 160-^61. 

15 Whenever a title of this sort is used in this text, it is because the method being 
illustrated is of greater importance to the reader than the actual data. 



342 BUSINESS STATISTICS 

and the subgroups of a classification, which will be clearly indicated 
by the scales and legends or key, need not be included in the title 
of a graph. 

Legend or Key. These terms are often used interchangeably, but 
for this discussion "legend" will refer to labeling written within the 
bars, sectors, etc., or adjacent to the curves of time series to tell what 
each represents. (See the graphs in chapter XX.) "Key" will refer 
to a group of blocks or lines at the bottom of a graph indicating by 
sample pieces of the various lines or types of cross-hatching the 
significance of each wherever it may appear on the graph. (See the 
graphs in chapter XXII.) There is no hard-and-fast rule as to which 
method should be employed. In general, it may be said that if the 
lines or hatching are repeated in several different parts of the graph 
it is better to use one key that will apply to all parts. In a map 
where the same kind of hatching or dots appears in a number of 
different areas, a key is practically unavoidable. If the graph includes 
two or more circles or sets of bars, each having corresponding parts 
that follow a common system of shading, it is easier to follow one key 
than to read the same legends in several different places. On the 
other hand, if there is plenty of space to print a clear legend right 
next to the curve or within the sector, good judgment will indicate 
that this should be done. 

Whenever legends are printed on the graph there are a number of 
points to consider, (a) There must be no possible confusion as to 
which curve or other part the legend is intended to mark. (&) If pos- 
sible, avoid printing between any line or bar and the scale from which 
its value must be read, (c) On a closely crossed grid a white space 
inclosed in a border should be left clear for printing each legend. 
(*/) Legends should be clearly printed, and worded as briefly as 
possible. 

The use of a key likewise calls for some words of caution, (a) The 
key must be neatly ruled and adequately labeled. () The lines or 
hatching must correspond exactly to those used in the graph, (r) The 
key is a part of the graph and should be inclosed within the outer 
border, if the graph has a border; certainly it should never be trans- 
ferred to some other page. 

Scales. A scale has two parts: its general label and the markings 
of its subdivisions. 

Labels: The label states the unit of the vertical scale or the numer- 



GRAPHS 343 

ical classification of the horizontal scale, as tons, dollars, years, etc. 
When the units are counted in large groups instead of singly, it also 
indicates the number in each group. To avoid confusion in locating the 
decimal point units should be grouped in thousands, millions, or bil- 
lions, rather than in tens, hundreds, ten thousands, etc. For example, 
in a scale having a range of to 2,000 tons, the values might well 
be written in full, or they might be shown as .25, .50, 1.00, 1.25, etc., 
under the label "tons in thousands" or "tons, 000 omitted." Just "000" 
means nothing; one does not know whether to interpret it as "in 
hundreds" or "000 omitted." Sometimes the ciphers omitted are stated 
in the title, in which case they should not be repeated in the scale 
label. The full contraction must be indicated in either one place or 
the other, never divided between the two. 

If a multiple scale is being used, each scale must state the item 
to which it applies. A graph of index numbers or other per cents 
will be labeled either "index" or "per cent," with no reference to the 
original unit. The graphs shown in this text should be observed for 
standard practice in wording and arrangement. Note that the labels 
always read parallel to the base of the graph; that is, the label of 
the vertical scale appears across the top of that scale rather than 
vertically along the side, whereas for the horizontal scale it is in 
the center under the markings for years, months, etc. 

Scale divisions and grid lines: Grid lines are scale divisions that 
are drawn all the way across a graph. They are usually fine solid lines 
of uniform thickness, although the lines indicating the ends of years, 
intervals of 50, etc., may be heavier than the intervening lines in order 
to set off the major divisions of a chart. Frequently only these main 
guide lines are drawn all the way across, the other values being 
indicated by short stubs along the axis. It is not necessary to indicate 
the numerical values of each one of these stubs but only enough 
of them to enable the reader to determine the value of any plotted 
point without too much trouble. The figures that are printed along 
a scale should be directly opposite the points to which they refer. 

The methods of marking intervals in a time series require special 
attention in order to avoid confusion in reading the graph. First we 
shall consider the various ways of charting annual data. There are 
four alternatives, as shown in Figure 48. 

In A, the year is indicated directly below the grid line on which the 
point is plotted. This is the preferable method for recording values 



344 BUSINESS STATISTICS 

at the same date, such as May 1, for several successive years; it can 
also be used for yearly averages or totals, or whenever a single figure 
represents the entire year. 

However, B is more suitable for the latter situation, since each 
point is plotted at the center of the space between two grid lines, 
the label for the year being also centered directly below it. Method B 
would also be correct for data as of June 30 or July 1, but not for 
any other given date, since the plotted points fall in the center of the 
yearly spaces. 

Method C is incorrect for yearly averages or totals but could be 
used if the data were as of December 31 or January 1, provided this 
fact were made clear in the title. For any other data it is ambiguous 
because one cannot tell whether the year named in the center of the 
space applies to the point on the grid line preceding it or following it. 

The last method, D, is the reverse of C; it is equally ambiguous and 
would be correct under no circumstances. 

The same principles can be applied to the correct graphing of 
monthly data. The space representing a year is usually set off by 
grid lines. This annual space must therefore be divided into 12 equal 
parts, each of which represents one month. The spaces indicating the 
months need not be labeled in a long time series, although if the 
scale is large enough the abbreviation or initial of the month at the 
center of each space is an aid to the reader. Years should be printed 
out in full, horizontally below the monthly labels, at the center of the 
year's space. 

If the monthly data are totals, averages, or mid-month recordings, 
they should be plotted at the center of each month's space. There will 
then be no value plotted directly on the grid line that marks the year's 
end, and it will be perfectly clear which point stands for December 
and which for January (See Figure 48-E). However, end-of -month 
data should be plotted at the end of each month's space, so that 
December 31 will quite correctly fall on the end-of -the-year's mark 
(See Figure 48-F). As in the case of annual data, Figure 48-C, this 
method is ambiguous unless the title of the graph indicates that the 
plotted points are recordings as of the last day of each month. 

It would also be possible to locate monthly stubs at the center of 
each monthly interval, Figure 48-G. This corresponds to method A 
for annual data, and causes no difficulty in reading the graph. How- 
ever, since the years are labeled at the center of the year's space, it 



GRAPHS 



345 



FIGURE 48 

METHODS OF PLOTTING TIME PERIODS 



RIGHT 




ANNUAL DATA 

c 



1933 1934 1935 



B 



\ 



1933 1934 1935 



WRONG 




1933 1934 1935 



D 




1933 1934 1935 



MONTHLY DATA 



AMBIGUOUS 




1936 



1937 



is more consistent to plot each month at the center of a space as in 
48-E, rather than at a stub. 

Accompanying Table. Since no graph aims to record exact numer- 
ical values it is always desirable to provide an accompanying table for 
the benefit of the reader who wishes to verify or make further use of 



346 BUSINESS STATISTICS 

the information. The table should appear on the same page as the 
graph, or on the page facing it, and both should read in the same 
direction. Students seldom realize the unfavorable impression made on 
an instructor or any other reader who is forced to compare a graph 
that reads vertically with a table that reads horizontally, or vice versa. 
It has already been suggested in the discussion of balance that a 
brief table may be printed on the graph in some unoccupied space if it 
does not interfere with the graphic presentation. 

Reference and Notes. The necessity for quoting the source and 
noting any discrepancies in the data was explained in discussing the 
requirements for statistical tables. 10 Practically the same rules may be 
applied to graphs although, if the information has been given in an 
adjacent table, reference on the graph need only be made to that table 
as the source. The reference, either to the accompanying table or the 
original source, is usually printed in the lower right-hand corner of 
the graph. 

Important Points in Actual Construction. No attempt will be made 
in this text to give a summary of the principles of mechanical drawing. 
A course in that subject is a great aid to anyone who wishes to draw 
graphs neatly and correctly. It is possible, however, to secure manuals 
on the subject, lettering guides, etc. A study of the instructions 
that come with ruling and lettering pens should help the student in his 
first efforts to use India ink. With a few hours of practice anyone 
can learn to handle a ruling pen and lettering stencils without blot- 
ting. Accuracy in scale and angle measurement is not beyond the 
capacity of the average person. Even lettering by hand is only a 
matter of a little care and practice in copying from lettering guides. 

In outline form, the order of steps in drawing a graph is as follows: 

a) Check all data for accuracy in computation or in copying from 
source. 

b) Plan the scales to conform to the correct size and proportion, 
within the range of the data. 

c ) Measure scales and draw axes and guide lines in pencil. (More 
pencil guide lines will be needed thnn finally appear on the graph.) 

d} Plot the data. 



Chapter VIII, p. 162. 



GRAPHS 



347 



e) Check plotting, reading from the points back to the data. 

/) Plan spacing of lettering titles, scales, labels, key, source, etc. 

g) Ink in all lines, including borders, guide lines, etc., taking time 
to let each section dry before doing further work near it. 

h) Ink in lettering, using stencils if possible. 
/) Erase all pencil marks. 

PROBLEMS 

1. What is the order of arrangement of the bars in the bar charts appearing 
in chapter XIII? 

2. The following is the production of anthracite coal in the United States at 
five-year intervals from 1900-40 (thousand short tons) : 



YEAR 


PRODUCTION 


YEAR 


PRODUCTION 


1 900 


57,468 


1925 


61,817 


1905 . ... 


77 660 


1930 


69 385 


1910 


84,485 


1935 


52,159 


1915 


88,995 


1940 


50,052 


1920 


89.598 







a) Present these data in a bar diagram. 

b) Why is this form superior to a line diagram for these data? 

c) How would you read from this diagram the 40-year history of the 
anthracite coal industry? 

V Find an applied use of the band chart in a published source. Describe the 
contents of the chart and state briefly the major relations portrayed. 

1 a) Plot the following data on four separate charts, corresponding to the 
four methods shown in Figure 44, A, B, C and D, pages 328-29 
Use 1935 as the base for the index number graph. 
b) Explain briefly what you think each graph shows. 

APPROXIMATE SALES, GROSS PROFIT AND NET PROFIT OF A SMALL 
MANUFACTURING CONCERN, 1932 TO 1938 



YEAR 


SALES 


GROSS 

PROFIT 


NET 
PROFIT 


1932 


$15,000 


$1,000 


$100 


1933 


22,000 


3,000 


400 


1934 


18 000 


1 500 


200 


1935 


26000 


4,500 


800 


1936 


20000 


2,250 


400 


1937 


42 000 


6 750 


1,600 


1938 


33.000 


3.375 


800 



348 



BUSINESS STATISTICS 



3. a) Draw a graph of the grapefruit production data given below. 
b) Study and interpret the facts shown by your graph. 

PRODUCTION OF GRAPEFRUIT IN THE UNITED STATES, 1919 TO 1939 * 



YEA* 


PfiODUCTION 

(Million Boxes) 


California 


Florida 


Texas 


Total 


1919 


.4 
.4 
1 
2 
2 
2 
2 
2 
2 


5.9 

8.8 
8 
15 
12 
18 
15 
24 
17 


.2 
2 
3 
3 
10 
12 
16 
15 


6.3 
9.4 
11 
20 
17 
30 
29 
42 
34 


1924 


1929 


1934 


1935 


1936 


1937 


1938 


1939 



Agricultural Statistics, 1938: and Crots and Markets, December, 1939. 

6. a) Why should every ordinary scale chart have a zero base line? 

b) Why in using two vertical scales on the same chart should the two 
scales bear some fixed relation to each other? 

c) Under what conditions is it justifiable to use colored inks in drawing 
charts? 

d) In a two-dimensional graph how do you determine which values to plot 
on the base scale? 

7. a) Find one published graph that you consider is correctly and effectively 

drawn, and explain why you think so. 

b) Find one published graph that you think has certain features that are 
incorrect, and give reasons. 

c) Find one published graph that you consider ineffective, and suggest 
changes that might add to its effectiveness. 

REFERENCES 

ARKIN, HERBERT, and COLTON, RAYMOND R., Graphs, How To Make and Use 
Them. New York: Harper & Bros., 1936. 

BRINTON, WILLARD C., Graphic Presentation. New York: Brinton Associates, 
1939. 

HASKELL, ALLAN C., Graphic Charts in Business. New York: Codex. Book Com- 
pany, 1922. 

KNOEPPEL, CHARLES E., Graphic Production Control. New York: The Engi- 
neering Magazine Co., 1920. 

LEHOCZKY, PAUL N., Alignment Charts, Their Construction and Use, Engineer- 
ing Experiment Station Circular No. 34. Columbus, Ohio: The Ohio State 
University Studies, 1936. 



GRAPHS 349 

MODLEY, RUDOLF, How To Use Pictorial Statistics. New York: Harper & Bros., 
1937. 

MUDGETT, BRUCE D., Statistical Tables and Graphs. Boston: Houghton Mifflin 
Co., 1930. 

RIGGLEMAN, JOHN R., and FRISBEE, IRA N., Business Statistics. New York: 
McGraw-Hill Book Co., Inc., 1932, Appendix III. 

RIGGLEMAN, JOHN R., Graphic Methods for Presenting Business Statistics. New 
York: McGraw-Hill Book Co., Inc., 1926. 

SMITH, HERBERT G., Figuring with Graphs and Scales. Stanford University, 
California: Stanford University Press, 1938. 

Time Series Charts t A Manual of Design and Construction. New York: The 
American Society of Mechanical Engineers, 1938. 



CHAPTER XV 
FREQUENCY DISTRIBUTIONS AND GRAPHS 

FREQUENCY DISTRIBUTIONS 

A FREQUENCY distribution is simply one of the methods of 
classification of data, and in form resembles any other statis- 
tical table. An example that has already been introduced in 
the text is Table 16-A, chapter VIII (wage rates of explosives workers) . 
This particular form of classification has been reserved for special 
treatment because the idea of grouping large masses of data according 
to their quantitative characteristics is one of the most fundamental 
processes in statistics. In many phases of business operations it is an 
important first step toward more advanced analysis. 

A frequency distribution is always a classification of data in which 
the items are combined in groups according to size. The ''ordered 
classification" is the independent variable, and the numbers of items 
that appear in the several groups become the dependent variable. For 
example, the dependent variable, number of firms, might be classified 
in groups according to annual dollars of sales, number of tons of prod- 
uct shipped weekly, number of employees, or hourly wage rates paid. 
On the other hand various dependent variables might be tabulated with 
any of these classifications (independent variables) ; e.g., with a wage 
classification the dependent variable might be either the numbers 
of employees receiving the various rates, the numbers of years in which 
the several wage rates were paid, or the numbers of states in which 
the rates were standard. / The number of units, or items counted, in 
each group is called its frequency. ' 

/According to this method of grouping large numbers of detailed 
observations, each individual item or occurrence loses its identity and 
becomes one of a larger group that has a broader definition of quantity 
or value. For instance, in grouping wage data a single wape payment 
of $18.52 might become one of a group of 65 payments designated as 
"$18.00 to $18.99." fit follows that the basic requirements for a satis- 
factory frequency distribution are: (1) the value of each individual 
item must be known at the outset, and (2) the values must be grouped 

330 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



351 



in such a way that the summary table will accurately represent the 
individual items from which it is compiled. 

First Steps in Analysis 

The steps that are followed in making an analysis by means of a 
frequency distribution will be illustrated by rent data that were col- 
lected in Columbus, Ohio. This small but representative sample of 155 
rent payments was secured as a by-product of a study of consumer 
habits in the patronage of dry-cleaning establishments. 

Arraying the Data. The initial step was to list each rent payment 
as the reports came in from the interviewers. The result is shown in 
Table 56. This random listing gives no clue whatever to any possible 



TABLE 56 

RENTS PAID BY 155 FAMILIES IN A CONSUMER 
SURVEY IN COLUMBUS, OHIO 



(Dollars Per Month) 



$50 
25 
18 
75 
55 
53 
30 
50 
31 
24 
13 
15 
40 
65 
68 
70 
80 
80 
35 
35 
40 
45 
40 
40 
48 
50 



$ 8 

80 
9 
15 
16 
35 
35 
50 
30 
28 
27 
25 
40 
20 
18 
16 
13 
85 
90 
80 
65 
35 
51 
51 
60 
50 



$60 
75 
75 
95 
35 
35 
35 
35 
32 
22 
22 
20 
40 
9 
17 
18 
30 
30 
35 
35 
35 
85 
95 
80 
70 
35 



$50 
50 
50 
60 
60 
25 
24 
35 
25 
25 
25 
30 
40 
12 
13 
15 
18 
30 
35 
30 
40 
40 
40 
40 
85 
35 



$75 
75 
53 
55 
60 
60 
60 
35 
65 
60 
40 
40 
45 
35 
35 
25 
15 
16 
18 
21 
21 
30 
30 
35 
40 
35 



$21 
25 
25 
15 
18 
80 
75 
51 
75 
51 
50 
45 
35 
35 
35 
30 
30 
30 
18 
20 
30 
35 
35 
35 
35 



interpretation of the data. The question now arises, if they were ranged 
in order of size would any significant relationship appear? In order 
to answer this they were next arranged in an array, as shown in 
Table 57. 



352 BUSINESS STATISTICS 

TABLE 57 

ARRAY OF RENTS PAID BY 155 FAMILIES 

IN A CONSUMER SURVEY IN COLUMBUS, OHIO 

(Dollars Per Month) 

$ 8 $21 $30 $35 $48 $65 

9 21 30 35 50 65 

9 21 30 35 50 65 

12 22 30 35 50 68 

13 22 30 35 50 70 
13 24 31 35 50 70 
13 24 32 35 50 75 
15 25 35 35 50 75 
15 25 35 35 50 75 
15 25 35 40 50 75 
15 25 35 40 51 75 

15 25 35 40 51 75 

16 25 35 40 51 75 
16 25 35 40 51 80 

16 25 35 40 53 80 

17 25 35 40 53 80 

18 27 35 40 55 80 
18 28 35 40 55 80 
18 30 35 40 60 80 
18 30 35 40 60 85 
18 30 35 40 60 85 
18 30 35 40 60 85 
18 30 35 40 60 90 
20 30 35 45 60 95 
20 30 35 45 60 95 
20 30 35 45 60 

. There are various ways of putting data in an array, depending some- 
what on the form in which they have been collected. If each item is 
on a separate card or sheet or schedule, these could first be sorted 
according to size and then listed. Or they might be tallied by assigning 
one line of a ruled sheet of paper to each possible value and then writ- 
ing down each rent as it appears from the random assortment. The 
result would appear as in Figure 49. If the rents or tally marks are 
evenly spaced the resulting rows take the place of a rough bar diagram 
in indicating the distribution of frequencies according to rental value. 
To get a true picture it is necessary to have a line for every unit rental 
value in the series, whether or not it has any frequencies. 

An alternative method for analyzing either the sorted or tallied 
data would be to draw a simple bar diagram as shown in Figure 50. 
The range of values is clearly revealed by comparing the shortest bar at 
the top of the graph with the longest one at the bottom. The values 
at which there are concentrations and the number of similar items 
of various values can be seen by looking for the bars of equal length. 
This type of graph is seldom used for final presentation unless it 
portrays characteristics which are peculiar to the data and which can- 



FREQUENCY DISTRIBUTIONS AND GRAPHS 353 

FIGURE 49 

TALLY OF MONTHLY RENTS PAID BY 155 FAMILIES IN A CONSUMER SURVEY IN 

COLUMBUS. OHIO 

RINI NIJMHIR 01 FAMIIHS RINI MUMHIR 01 FAMMIIS 

$52 

53 11 

54 

55 11 

56 
57 
58 
59 

60 mi in 

61 

62 

63 

64 

65 111 

66 

67 

68 1 

69 

70 11 

71 

72 

73 

74 

75 mi 11 

76 

77 
78 

mi 111 79 

80 mi i 

81 

82 

83 

84 

85 111 

86 

87 

88 

89 

90 1 

91 

92 

93 

94 

95 U 

not easily be shown by any other graphic form. It is a helpful graph 
in preliminary analysis, however, for it provides the basis for the 
examination which is necessary before the data can be grouped. 

Preliminary Grouping of the Data. From a study either of Fig- 
ure 49 or Figure 50 it can be seen at once that the whole range of 
data extends from a low value of $8 to a high of $95. The largest 
number of rents appears to be at $35 but there are certain other con- 
centration points, notably at $25, $30, $40, $50, and $60. 



$ 8 


1 




9 


11 




10 






11 






12 


1 




13 


111 




14 






15 


mi 




16 


in 




17 


i 




18 


mi n 




19 






20 


111 




21 


111 




22 


11 




23 






24 


11 




25 


mi 1111 




26 






27 


i 




28 


i 




29 






30 


rw THI 


111 


31 


1 




32 


1 




33 






34 






35 


mi mi 


mi 


36 






37 






38 






39 






40 


mi mi 


1111 


41 






42 






43 






44 






45 


111 




46 






47 






48 


i 




49 






50 


mi 1111 




51 


1111 





FIGURE 50 

ARRAY OF RENTS PAID BY 155 FAMILIES IN COLUMBUS, OHIO 
(Each bar represents one family) 

10 20 30 40 50 60 70 80 90 



100 




10 20 
Data from Table 57. 



30 40 50 60 70 
RENTS IN DOLLARS 



80 90 100 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



355 



1 At individual values: This suggests the next logical step which is 
the initial grouping process, that is, to count all the items having the 
same value. In the frequency array shown in Table 58 all of the rents 
appear in order along with the number of times each individual rent 
occurs J 

TABLE 58 

FREQUENCY ARRAY OF MONTHLY RENTS PAID BY 155 FAMILIES IN A 
CONSUMER SURVEY IN COLUMBUS, OHIO 



RENTALS 
PAID 


NUMBER 


RENTALS 
PAID 


NUMBER 


RENTALS 
PAID 


NUMBER 


$8 


1 


$25 ... 


9 


$53 


2 


9 


2 


27 


1 


55 


2 


12 


1 


28 


1 


60 


8 


13 


3 


30 


13 


65 


3 


15 


5 


31 


1 


68 


1 


16 


3 


32 . ... 


1 


70 


2 


17 


1 


35 


28 


75 


7 


18 


7 


40 


14 


80 


6 


20 


3 


45 


3 


85 


3 


21 


3 


48 


1 


90 


1 


22 


2 


50 


9 


95 


2 


24 


2 


51 


4 


Total 


155 















The characteristics of the data begin to stand out more clearly. We 
now know exactly how many rents of each amount were paid. The 
$35 rent occurs 28 times, having the highest frequency in the array, 
while the $30 and $40 amounts are almost tied for second place with 
13 and 14 frequencies, respectively. The rents less than $35 are con- 
centrated between $8 and $32, whereas those greater than $35 are 
spread over a range from $40 to $95. 

I In class intervals: However, there are still too many separate values 
listed for easy comprehension of the complete information regarding 
these rents. The entire situation can be readily grasped only after the 
155 items have been grouped into a few classes. These classes must 
cover the entire range from $8 to $95 and must represent as far as 
possible the characteristics that have been observed from studying 
the individual items. Before continuing this process with the rent data 
it will be necessary to consider a number of points that must always 
be taken into account in determining the groups of a frequency 
distribution./ 

Principles for Grouping Data 

The questions that must be answered before deciding how to group 
any individual data are: 



356 BUSINESS STATISTICS 

1. Into how many groups, or class intervals, should a given set 
of data be divided? 

2. What should be the width of each interval? 

3. At what values should the class limits be set? 

4. How should the class limits be designated? 

Number of Intervals. The number of intervals is less important 
than the width of intervals and the values of class limits. The exact 
number used will finally be determined by the range of the data after 
these other two points have been decided. There are a few rule-of- 
thumb guides, however, which aid in roughly determining the number 
of intervals. In the first place, since the purpose of grouping is to aid 
in the summary and comprehension of data, there should be no more 
intervals than can be quickly grasped. In the second place, the number 
of intervals cannot be so small that important characteristics of the 
data are concealed. These two criteria, however, are rather general 
to serve as operating rules in determining how many class intervals 
to use in a given frequency distribution.; 

Statisticians have indicated the number of intervals which in general 
meet the requirements of distributions of most kinds of data. Yule 
says that desirable conditions will usually be fulfilled if the "number 
of classes lies between 15 and 25." * A minimum is suggested by the 
statement that it is "desirable to have more than eight classes." 2 It has 
been suggested that the number of classes can be determined by the 
use of a formula which has been developed from the theory of binomial 
expansion. The formula as developed by Sturges 3 is: Number of class 
intervals = 1 + 3.322 log of number of observations in the distribu- 
tion. Solution of this formula indicates the following number of 
class intervals should be used with designated numbers of observations: 

NUMBER OP NUMBER OF 

OBSERVATIONS CLASS INTERVAL! 

100 8 

200 9 

400 10 

600 10 

800 11 

The number of classes should be determined only after making a 

1 G Udny Yule and M. G. Kendall, An Introduction to the Theory of Statistics 
(London- Charles Griffin and Co., Ltd., 1937), p. 85. 

2 Frederick E. Croxton and Dudley J. Cowden, Practical Business Statistics (New 
York: Prentice-Hall, Inc., 1937), p. 153. 

8 H. A Sturges, "The Choice of a Class Interval," Journal of the American Statistical 
Association, Vol. XXI (1926), pp. 65-66. a. Harold T. Davis and W. F. C. Nelson 
Elements of Statistics ( Bloomington, Indiana- The Principia Press, 1935), p. 16. 



FREQUENCY DISTRIBUTIONS AND GRAPHS 357 

careful study of all the characteristics of the data, instead of applying 
this formula indiscriminately. 

Width of intervals. There are no arbitrary criteria for determining 
the width of the class intervals in any distribution, but the following 
considerations are pertinent to the problem. 

1. Class intervals should not be so wide that too much of the 
detail of the distribution is lost through grouping. It is true that the 
purpose of the frequency distribution is to summarize and to reduce 
the volume of the data to workable proportions, but the features of 
the data should not be concealed or eliminated through grouping 
in wide intervals. 

2. Little is gained on the other hand by arranging data in very 
small intervals, if the number of classes then remains too large to 
provide an effective summary. 1 : 

3. The total number of frequencies in the distribution serves as 
a rough guide to the size of the class intervals 4 that should be em- 
ployed. That is, when there are a great many observations the intervals 
can be relatively small because it will be permissible to have a large 
number of classes. Within the same range of data, if there are only 
a few observations the number of class intervals must be smaller and 
the width of the intervals will be correspondingly greater. 

4. If there is any discernible pattern in the distribution, however, 
it will serve as a much better guide to the size of the classes. For 
instance, if hourly wage rates are being studied, it may be found that 
more men are paid even five- or ten-cent rates than any intervening 
amounts. This pattern must be preserved in the grouped data through 
correct choice of the size of class intervals. The class width should 
be five cents or some multiple of five cents so that there will be an 
equal number of concentration points within each interval. 

5. The class intervals should be chosen so that there will be a 
minimum number of classes that contain no frequencies. 

6. ' If the distribution which is being constructed is to be compared 
with others that are already prepared, the intervals in the new dis- 
tribution should be made to conform with those in the previous 
distribution.- If several different but comparable distributions are being 
prepared at the same time, the size of the class intervals must be 

4 Related to the formula on page 356, Sturges recommends the following formula foi 
the determination of the size of class intervals: 

c . . . . , range of data 

Size of class intervals = ; ^ = p -. : 

1 -f 3.322 log of number of observations 



358 



BUSINESS STATISTICS 



established in view of the characteristics of the several distributions. 
7. / In a given distribution every effort should be exerted to make 
all class intervals equal.; The distribution of data in unequal intervals 
makes analysis difficult and certain kinds of computation impossible. 
Although unequal class intervals should not ordinarily be used, there 
are certain cases in which they are unavoidable. 

a.) .In some cases where a few high-valued observations are widely 
dispersed, they may be grouped in increasingly large class intervals, 
so as not to reveal the identity of the individual cases,- Unequal class 
intervals are frequently employed for this reason by various govern- 
ment departments. Table 59-A indicates a frequency distribution 
of this type. 

b.) /Analysis of the data may indicate that unequal class intervals 
define the homogeneity of the observations more accurately than equal 
intervals./ In Table 59-B for instance, unequal class intervals were used 
because the management felt that these divisions of purchases gave 
them the most assistance in their merchandising plans. 

c.) ! Equal relative increases in the widths of class intervals may 
be of more significance to a particular distribution than equal absolute 
changes. Consequently, a frequency distribution which appears to have 
unequal class intervals may in reality have intervals which are increas- 
ing in size at a uniform rate. 



TABLE 59 



NUMBER OF BROADCASTING STATIONS IN 

THE UNITED STATES, BY ANNUAL 

REVENUE RECEIVED, 1935 * 



ANNUAL REVENUE 


NUMBEB OF 
STATIONS 


Less than $10,000 


48 


$10,000- 24,999 


67 


25 000- 49 999 


59 


50,000- 99,999 


46 


100,000-249,999 


45 


250 000-499 999 


17 


500,000 and over 


7 


Total 


289 



* Radio Broadcasting, Census of Business: 
1935, p. S3. 



B 



NUMBER OF PURCHASERS AT THE COLUMBUS 

CONSUMERS' COOPERATIVE ASSOCIATION, 

BY VALUE OF PURCHASES, JULY 1, 

1937, TO DECEMBER 31, 1937 1 



VALUE OF PUSCHASES 


NUMIKI OF 
PURCHASERS 


$ 0.00 to $19.99 


248 


20.00 to 39.99 


140 


40.00 to 89.99 


202 


9000 to 149.99 


74 


150 00 to 299 99 


49 


300 00 and over 


11 


Total 


724 



t Unpublished Stud 
of the Columbus 
Association, 1938. 



of Patron Purchasing 
Cooperative 



y of Patro 
Consumers' 



8. Finally, /the problem of determining the size of the class intervals 
in a frequency distribution cannot be separated from that of establish- 



FREQUENCY DISTRIBUTIONS AND GRAPHS 359 

ing the location of the class limits/ that is, the values at which each 
group in the distribution will begin and end. It is usually necessary 
to consider these two points together before arriving at any decision 
with regard to either of them. 

Limits of Intervals. As in the determination of the size of class 
intervals, there is no single guide to the location of class interval limits. 
The several general criteria which follow indicate the nature of the 
problems that arise and the solutions that may be employed in various 
types of distributions. 

1. If there is no pattern nor other guide, the very simple procedure 
of dividing the range of the data in the distribution by the approximate 
number of intervals may be usedj For convenience, the actual dividing 
points thus established would then be rounded to the nearest whole 
numbers. Such a procedure, however, may completely disregard impor- 
tant characteristics of the data that should be revealed by the distribu- 
tion, and should never be employed unless a careful study of the data 
has failed to reveal any pattern. 

2. < If a pattern is discovered, the limits should be set so as to 
preserve in each group the characteristics of the individual items the 
same as in determining the width of the intervals/ This can be done 
by observing the values at which the frequencies are greatest and 
establishing the class limits so that these values fall at the midpoints./' 
For example, in the case quoted of concentration of wage rates at 
five-cent intervals, the limits would not be set at 25, 30, 35 cents, etc., 
but at 27.5, 32.5, 37.5, etc., so that the concentration point falls at 
the center of each interval. If ten-cent intervals were used the limits 
would be set at 27.5, 37.5, etc., or 22.5, 32.5, etc., so that there would 
be two concentration points in each interval, each equidistant from 
the center and from either end. 

3. 'Even though no special pattern is present, the class limits should 
be established so that the value half way between the class limits 
approximates the arithmetic average of the observations included in 
each class interval. This midvalue of each interval is called the "mid- 
point" or the "class mark." 

4. When possible, the limits should be chosen so that the midpoints 
are integers. The importance of this guide will become clear in the 
computation of averages from frequency distributions. As in the case 
just cited, it is usually more important to have the midpoint an integer 
than to have the class limits themselves integers. 



360 BUSINESS STATISTICS 

5. On some occasions one or both ends of the distribution may 
be left "open"; the minimum and maximum values are not shown, 
(See Table 59.) These open-end frequency distributions are some- 
times necessary to conceal the identity of the cases at the extremes, 
but the absence of limiting values is a serious handicap in subsequent 
analysis. 

Designation of Class Limits. The interpretation of the data in a 
frequency distribution and the evaluation of their accuracy depend 
largely upon the precise designation of the class limits. The method 
of designation may in turn depend upon the nature of the data 
involved. 

Discrete or continuous data: An important consideration is whether 
the data are discrete or continuous. Discrete data are those which 
occur only at exact values at regular intervals but never at any 
intervening values.' For example, stock prices are quoted in eighths 
of a point. With the exception of a few special listings, no stock 
would be quoted, at any price between | and i, f and | v etc. 
Likewise a classification according to number of employees could never 
be anything except whole numbers. In the latter case the classes would 
naturally be designated as 1 to 10, 11 to 20, etc., and no question 
could arise as to any fractional value between 10 and 11. 

/ Continuous data, on the other hand, are those which may occur 
at every conceivable point along a continuous scale of valuesj This 
distinction between measured values and separate items arid the 
methods for handling each statistically will be discussed in greater 
detail in chapter XVII. As a matter of fact, a classification in discrete 
units is much more puzzling to handle correctly in computation but, 
when class limits are being designated, continuous data afford a 
greater variety of alternative methods. 

Examples of methods: Some of the methods for designating class 
limits are better than others in clarifying the actual limits that are 
employed. One of the most common methods has been shown in 
Table 59- Other methods are illustrated in Figure 51. 

Of the four methods, the one shown in Figure 51-A is the poorest 
for it may be ambiguous. It is not clear whether exactly $500 is 
included in the first or the second interval. In spite of this weakness, 
this form is widely used and is ordinarily interpreted as $250 and 
under $500. 

Figure 51-C differs from the form in Table 59-A only in the loca- 



FREQUENCY DISTRIBUTIONS AND GRAPHS 361 

FIGURE 51 

METHODS OF DESIGNATING CLASS LIMITS 

ABC 

$ 250-$ 500 Under 30 cents $ 51-$100 

500- 750 30 and under 35 cents 101- 150 

750- 1,000 35 and under 40 cents 151- 200 

1,000- 1,250 40 and under 45 cents 201- 250 

1,250- 1,500 45 and under 50 cents 251- 300 

1,500- 1,750 50 and under 55 cents 301- 350 

1,750- 2,000 55 and under 60 cents 351- 400 

tion of the class limits: in one the round hundreds and fifties are the 
upper values of the classes, whereas in the other various multiples of 
round thousands are the lower values of the classes. Either of these 
forms can be used, Figure 51-C being preferred for discrete data, and 
Table 59-A for continuous data. The purpose in each case is to make 
the class intervals equal. Discrete data, which usually start at a value 
of one, must read 1-50, 51-100, etc., whereas continuous data are 
measured from zero and read 0-99, 100-199, etc. 

In the case of continuous data, the person who prepares the table 
must make a decision regarding significant figures, whether he uses 
the form in Table 59-A, 59-B, or even Figure 51-B. In the first case 
the values are rounded at dollars, so that presumably any value up to 
$9,999.50 would go in the first class, and $9,999.50 and over in the 
second, etc. Similarly in Table 59-B the dividing line is $19.995. 

The designation in Figure 51-B indicates an indefinite number of 
decimal places, although in actual practice the dividing point between 
classes would seldom be carried any farther than a half cent. 

Another common method of indicating class values, especially when 
the intervals are quite small, is by the value of the midpoint, as 
average grade, 75, 80, 85, 90, where 80 per cent includes everything 
from 77.5 to less than 82.5, etc. In other cases, classes that are listed 
in this way may represent single unit values of a discrete series. 

Of all these possible methods for designating class limits, Figure 
51-B is the least ambiguous. The units in which any particular data 
are expressed will ordinarily make clear to the reader to how many 
significant figures the class limits have been carried. This method 
requires more space for the stub of the table but it is nevertheless 
the method preferred by the authors. 

An Example of the Preparation of a Frequency Table 

The principles set forth in the preceding section will now be applied 
in the preparation of a frequency distribution of the sample of rents 



362 BUSINESS STATISTICS 

paid in Columbus, Ohio. The preliminary preparation of these data 
was carried out earlier in the chapter, leaving them in the form found 
in Table 58. The next step is to determine the number of class 
intervals, the width of interval and the interval limits that will pro- 
duce a concise and effective table. This means a table that is compact 
enough to be grasped quickly, and with the details arranged so that no 
essential characteristic of the data will be lost. 

The range of $87 between the lowest and highest rents paid imme- 
diately suggests the use of nine $10 intervals. Nine intervals for 
155 items appears reasonable and the $10 width is convenient. The 
most important consideration, however, is the existence in the distribu- 
tion of concentration points at the $5 and $10 rents, i.e., the tendency 
to fix rents at $25, $30, $35, $40, etc. This means that the width 
of the class interval must be $5 or some multiple of $5. A $5 
interval would give too many classes, a $15 interval too few, hence 
$10 emerges as the proper width to use and the number of intervals 
is automatically set at nine, if the first class is set at $7.50 to $17.50. 
The first class might also read $2.50-$12.50, so that the last of ten 
classes would read $92.50-$102.50. There is no general rule that 
requires the use of either one or the other of these systems of intervals 
and the distribution of frequencies will, of course, be different accord- 
ing to which is employed. Perhaps the best plan is to regroup the 
initial $5 intervals in $10 intervals according to both systems and then 
select the one that gives the smoother distribution or appears to be 
the better description of the data. In the distribution of rents the 
$7.50-$17.50 set of intervals seems preferable. 

The same circumstance that led to the selection of $10 intervals 
also becomes the guide to the proper class limits. Each class will con- 
tain two points of rent concentration. These must fall at equal dis- 
tances from the center and ends of the class in order to meet the 
requirement that the average value of the items included in any class 
shall be approximately equal to the midpoint of that class. Hence 
the first class containing the $10 and $15 concentration points must 
have its midpoint at $12.50. The next containing the $20 and $25 
concentration points must have its midpoint at $22.50 and so on. 
The class limits, therefore, must be $7.50, $17.50, $27.50, $97.50. 

Actual Frequencies. The three parts of Table 60 contain different 
distributions resulting from the use of three distinct sets of $10 class 
intervals. That is, A, B, and C are independent distributions each 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



363 



TABLE 60 

THREE FREQUENCY DISTRIBUTIONS OF MONTHLY RENTS PAID BY 155 FAMILIES 
IN A CONSUMER SURVEY IN COLUMBUS, OHIO 



CLASS INTERVAL 


(1) 

FREQUENCY 


^< 2) 
CLASS 

MARK 


CLASS INTERVAL 
AVERAGE 



Frequency Distribution A 



$ 5 and under 


$ 15 


7 


$10 


$11.0 


15 and under 


25 


26 


20 


18.5 


25 and under 


35 


26 


30 


28.2 


35 and under 


45 


42 


40 


36.7 


45 and under 


55 


19 


50 


49.6 


55 and under 


65 


10 


60 


59.0 


65 and under 


75 


6 


70 


67.2 


75 and under 


85 


13 


80 


77.3 


85 and under 


95 


4 


90 


86.2 


95 and under 


105 


2 


100 


95.0 


Total . 




155 







Frequency Distribution B 



$ and under 


$ 10 


3 


$ 5 


$ 8.7 


10 and under 


20 


20 


15 


15.8 


20 and under 


30 


21 


25 


23.6 


30 and under 


40 


43 


35 


33.3 


40 and under 


50 


18 


45 


41.3 


50 and under 


60 


17 


55 


51.2 


60 and under 


70 


12 


65 


61.9 


70 and under 


80 


9 


75 


73.9 


80 and under 


90 


9 


85 


81.7 


90 and under 


100 


3 


95 


93.3 


Total . 




155 







Frequency Distribution C 



$ 7.50 and under $17.50 


16 


$12.50 


$13.6 


17.50 and under 27.50 


27 


22.50 


22.0 


27.50 and under 37.50 


44 


32.50 


33.2 


37.50 and under 47.50 


17 


42.50 


40.9 


47 50 and under 57.50 . . 


18 


52 50 


51.0 


57.50 and under 67.50 


11 


62.50 


61.4 


67.50 and under 77.50 


10 


72.50 


73.3 


77.50 and under 87.50 


9 


82.50 


81.7 


87.50 and under 97.50 


3 


92.50 


93.3 


Total 


155 







of which has been constructed from Table 58. Column 1 records the 
number of rents falling within the limits indicated for the several 
classes in each of the three distributions. In distributions A and B 
the two concentration points fall at the beginning and center of the 
intervals. In distribution C, however, the center of the interval or class 
mark lies midway between the two concentration points. 

Columns 2 and 3 have been added to Table 60 to demonstrate 
the superiority of distribution C. The class marks in column 2 con- 



364 . BUSINESS STATISTICS 

form to the definition previously given. The class averages in column 3 
are obtained from the array in Table 57. For example the seven rents 
recorded between $5 and $15 total $77 or an average of $11. Parallel 
computations for each class of each distribution lead to the averages 
as recorded. Comparison of column 3 with column 2 shows that in 
distribution A the averages are less than the class marks in all classes 
except the first and that the differences are appreciable except in the 
"$45 and under $55" class. Likewise, in distribution B the averages 
are below the class marks except in the first and second classes. In 
distribution C four averages are above and five below their respective: 
class marks and the differences between the averages and the class 
marks are small. Distribution C, therefore, meets the requirement that 
class marks should approximate the actual averages of the items in- 
cluded, whereas the other two distributions contain a definite bias. 
This bias would have an adverse effect upon any numerical measures 
computed from those distributions. 

If the size of the rent sample were increased to several thousand 
items, the averages and class marks in distribution C would tend to 
coincide, but the bias would persist in distributions A and B. For this 
reason the characteristics of the universe "rents in Columbus, Ohio*' 
can be studied from distribution C only. 

Percentage Frequencies. Percentage frequencies are preferable to 
actual frequencies for some purposes. Table 61 (which has been set 
up with title and headings such as would be used in a presentation 
table, in contrast to the work-table headings of Table 60) shows both 
the actual frequencies from Table 60-C, and their percentage distribu- 
tion. Two major uses of percentage frequencies should be mentioned: 
(1) the comparison of the individual frequencies with each other and 
with the total, and (2) comparisons between two or more distributions 
having the same or equivalent class intervals. Thus from Table 61, 
column 2, it is apparent that more than one-fourth of the rents were 
between $27.50 and $37.50, that less than one-fourth were above 
$57.50 and that more than one-fourth were less than $27.50. The 
advantages of the percentage frequencies in comparing two distribu- 
tions graphically will be shown later in the chapter. 

GRAPHS OF FREQUENCY DISTRIBUTIONS 

The frequency distribution is a separation of a whole into parts, 
the frequencies being merely a record of the number of individual 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



36} 



TABLE 61 

NUMBER AND PERCENTAGE DISTRIBUTION OF FAMILIES IN COLUMBUS, OHIO, 
ACCORDING TO VALUE OF MONTHLY RENTALS PAID 



RENTALS PAID 


FAMILIES 


(1) 
Number 


(2) 
Percentage 
Distribution 


$ 7.50 and under $17.50 


16 
27 
44 
17 
18 
11 
10 
9 
3 


10.3 
17.4 
28.4 
11.0 
11.6 
7.1 
6.4 
5.8 
1.9 


17.50 and under 27.50 


27.50 and under 37.50 


37.50 and under 47.50 


47.50 and under 57.50 


57.50 and under 67.50 


67.50 and under 77.50 


77.50 and under 87.50 


87.50 and under 97.50 


Total 


155 


100. 



items falling in each quantitative class of the distribution. The major 
purpose in presenting a distribution graphically is to emphasize the 
relation of the parts to the total and to each other. 

The graphic methods used in presenting frequency distributions are 
perhaps more standardized than in the case of any other kind of 
statistical data. The forms have become so widely accepted that it is 
necessary to follow without noticeable deviation the generally accepted 
rules for their construction. This does not imply, however, that we may 
not inquire into the underlying principles that have led to the develop- 
ment and universal acceptance of these methods. 

Construction and General Characteristics 

As was indicated at the end of chapter XIII, frequency distribution 
graphs are of the two-dimensional variety. The class intervals are 
always plotted on the horizontal axis and the frequencies on the vertical 
axis. Ordinary arithmetic scales are used on both, except for some 
very specialized types of distribution which are excluded from the 
present discussion. The vertical scale must always begin at zero, but 
the horizontal scale need include only the range of the class values, 
plus an extra interval at either end. The two most common frequency 
diagrams are the histogram and the frequency polygon. 5 

5 A third diagram, the smooth curve, is often discussed along with the histogram and 
the frequency polygon. It is a trace of the form which the frequency distribution would 
take if a very large number of cases were included and class intervals become infinitesimally 
small. From the point of view of universe and sample this concept is of considerable 
importance and will be employed in chapter XXVIII. The smooth curve has little 
application at the elementary level; hence no further reference will be made to it jn 
this chapter 



366 BUSINESS STATISTICS 

The Histogram. fThis form of diagram consists of contiguous rec- 
tangles, or columns, ranged along the base scale, the height of each 
one being determined by the number of frequencies in the class upon 
which it stands. The total combined area of all the columns represents 
the total number of frequencies in the distribution.! It may be consid- 
ered that each column is like a pile of coins, each coin representing 
a single frequency. The thickness of the coin equals the value of one 
frequency on the vertical scale, and its diameter corresponds to the 
width of the class interval. Viewed from the front each coin occupies 
a narrow rectangular space ) 1 . Several such adjacent piles of 

different heights, would look very much like a frequency histogram. 
If a few coins were moved from one pile to another the total front 
view area, representing the total number of coins or frequencies, would 
remain the same regardless of changes in the distribution. 

Figure 52-A which represents the rent data distribution of Table 
60-C, illustrates the important features of all histograms. The greatest 
concentration of frequencies is at once apparent from the location 
of the tallest column on the base $27.50 to $37.50. The other columns 
start from zero frequency on the left and gradually increase in height 
as they approach this class of maximum frequency, while those on 
the right fall away from it and finally reach zero again. This is the 
characteristic shape of a frequency graph portraying the chance occur- 
rence of a set of homogeneous events. Variations from this usual 
shape will be discussed later in the chapter. 

/ Another point to be noted from the histogram is that each column 
rests not upon a single point but upon the entire interval included 
within the class limits. This indicates that the frequencies in any 
interval are spread over that interval, and that the base scale values 
occur in a continuous sequence. / 

The Frequency Polygon. This form of diagram is illustrated in 
Figure 52-B, using the same data as in 52-A. The histogram has 
been lightly blocked in as a background to show that the polygon 
can be drawn by connecting the midpoints of the successive columns. 
It could equally well be drawn without the histogram, by plotting 
points measured from the abcissas at the midpoints of the class 
intervals and the ordinates of the corresponding frequencies. The line 
connecting these points is extended to zero at the midpoint of the 
class at either end beyond the range of frequencies; thus the broken 
line or "curve," together with the base line, forms a "polygon" inclos- 
ing the entire area of frequencies. / 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



367 



FIGURE 52 

Two TYPES OF PKFQUENCY DIAGRAM OF RENT DATA 
NUMBER OF RENTALS TFREQUENCItS] NUMBER OF RENTALS 



40 
30 
20 
10 



A. HISTOGRAM 



40 



30 



20 



10 




7.50 1750 2750 37504750 5750 67.50 7750 87.50 975010750 



40 



30 



20 



10 



B. FREQUENCY POLYGON 




40 



30 



20 



10 



7.50 I75O 27.50 375O 4750 575O 6750 7750 875O 9750 10750 
DOLLARS OF RENT PAID [CLASS INTERVALS] 

Data from Table 60-C. 

It can be demonstrated that the total area of the polygon is exactly 
equal to that of the histogram, although the area included in each class 
has been slightly altered. When the midpoint of the rectangle standing 
on the base, $17.50 to $27.50, is joined with the midpoint of the 
rectangle standing on the base, $27.50 to $37.50, the triangle CDH 
is added to the area on the $17.50 to $27.50 base and the triangle 
ABH is removed from the area on the base $27.50 to $37.50. But 
the areas of the triangles are equivalent (AB = CD and angle a = 
angle b), therefore the area of the figure X 2 CBX 4 is equal to the area 
of the figure X 2 CDABX. A similar argument for each adjacent pair 
of rectangles proves that the area between the polygon and the base 



368 BUSINESS STATISTICS 

line is the same as the sum of the areas of the rectangles. Hence the 
polygon is a smooth trace of the histogram in which the total area is 
preserved but the idea of graduated increase and decrease in the 
frequencies is substituted for the steps of the histogram. 

It must be noted, however, that (unless by chance points G-C-B 
lie in a straight line) triangle CDH which has been added to the class 
$17.50 to $27.50 is not equivalent to triangle CEK which has been 
removed from it, and that area XJ^CHX^ is therefore not equal to 
area X^EDX^. 

This fact leads to the conclusion that the histogram is the more 
appropriate form to use when it is necessary to represent exactly the 
number of frequencies in each class, whereas the polygon gives a better 
picture when a smoothed distribution is wanted. 

Uses of Each Type of Graph 

For the majority of frequency distributions of economic or social 
data, that is, for collected data that are in reality samples taken from 
a larger universe, either type of diagram may be employed. However, 
because the polygon smooths the contour of a distribution while 
maintaining the total area, it is suitable only for data that are 
continuous. 

Continuous Data. In the rent distribution suppose that instead of 
155 items the sample had been doubled giving a total of 310 rentals, 
its representative character, of course, being preserved. If the number 
of class intervals were then doubled and the width of each interval 
reduced to five dollars, 6 the resulting histogram would retain the 
general shape of Figure 5 2- A, but due to the operation of the principle 
of statistical regularity some of the variability of sampling would be 
removed. Consequently the new histogram would resemble the polygon 
in Figure 52-B more closely than does the original histogram. If the 
cases in any such distribution could be multiplied indefinitely, and the 
class widths decreased accordingly, the final contour would be prac- 
tically identical whether drawn as a histogram or a polygon. 'This 
illustrates the assumption underlying the drawing of a polygon it 
interpolates from the sample data the probable intervening values of 
the universe. Thus it gives a description of the universe derived from 
the information supplied by a single sample of continuous data. / 

6 Further reduction of the width of the interval would not be possible in this distri- 
bution, regardless of the size of the sample, because of the concentration of the actual 
amounts on the five-dollar values, $35, $40, etc. 



FREQUENCY DISTRIBUTIONS AND GRAPHS 369 

'This does not mean, however, that each point of the polygon can 
be read as an actual frequency unless the total frequency is infinite 
and the width of each class interval infinitesimal. For example, in 
the rent polygon there might be a natural tendency to think of every 
point on the polygon as representing a number of frequencies, a ten- 
dency to think of 35 rentals, for instance, at $27.50. This is erroneous 
because in the polygon as in the histogram there are only 35.5 cases 
between $22.50 and $32.50. It is correct to say that at the $27.50 
level rentals occurred, at the rate of 35 per $10 interval. 

TABLE 62 
NUMBER OF EMPLOYEES IN NEW JERSEY LAUNDRIES, NOVEMBER, 1936* 



No. OF EUFLOYEXS 


No. OP LAUNDRIES 


1 and less than 10 


16 


10 and less than 25 


41 


25 and less than 50 


24 


50 and less than 100 


7 


100 and less than 200 


4 


200 and less than 300 


4 


300 and less than ^00 


2 






Total 


98 



Monthly Labor Review, United States Department of Labor (October, 1937). p. 888. 

; Discrete Data. In dealing with discrete data, a polygon would 
incorrectly indicate intervening values that could not possibly exist; 
consequently the histogram must be used. Each class of a discrete 
distribution may include only one unit, or several discrete units grouped 
together, j 

An example of the latter is shown in Table 62, number of laundries 
employing a specified number of workers. As grouped in this table, 
each class contains several distinct sizes of laundry. Therefore the 
columns of a histogram which indicate that the frequencies in each 
group are distributed over the entire class would give a correct repre- 
sentation. However, these data might be further broken down into 
the number of laundries employing 8, 9, 10, etc., workers, each class 
having a width of only one natural unit (the individual worker). 
In this case the discrete nature of the data would be more accurately 
represented by separate bars erected at -the midpoint of each class 
interval. The frequencies are all concentrated at these points, and 
not distributed from 8.5 to 9.5, 9.5 to 10.5, etc. 

It should be noted that this is the only situation in which separated 
bars may be used in a frequency diagram The adjacent columns of 



370 



BUSINESS STATISTICS 



a histogram do resemble bars in a general way, but are of an entirely 
different nature. The area of the columns of a histogram is important, 
whereas in a bar diagram only the height is measured. When the 
class intervals are equal, as in all the illustrations up to this point, 
the heights of the columns are in the same proportion to one another 
as their areas, because all of their bases are equal. In the case of 
unequal class intervals, however, the bases of the columns are unequal, 
hence the areas are not in direct proportion to their heights. This 
point will be explained later in the chapter. 

TABLE 63 
NUMBER OF JUNIOR DRESSES SOLD DURING MONTH OF FEBRUARY, ACCORDING TO SIZE * 



SlZlB 


NUMBII OP DRESSES SOLD 


9 


171 


11 


1,082 


13 


1,676 


15 


1,335 


17 


384 






Total 


4.868 



' Confidential information from a Buffalo department store. 

Another example of a discrete natural unit is shown in Table 63. 
Sizes of junior dresses are discrete classes, hence the use of separated 
bars is warranted, as shown in Figure 53-A. However, in order to give 
the impression of area representing the total number of dresses sold, 
the use of the histogram, Figure 53-B is more common. In this case, 
the odd-numbered size is the natural unit and is just as indivisible as 
was the individual employee in the preceding example. There are no 




FIGURE 53 
FREQUENCY DIAGRAMS OF DISCRETE DATA: NUMBER OF DRESSES SOLD IN JUNIOR SIZES 

NUMBER SOLD NUMBER SOLD 

leoofAP B I FR! n ^ ieoo 



1200 

800 

400 





11 13 15 
SIZES 



17 



9 11 13 15 17 

SIZES 



Dat* -from Table 63. 



FREQUENCY DISTRIBUTIONS AND GRAPHS 371 

junior dress sizes other than those given in Table 63 7 so that the 
number of classes in such a distribution could not be increased nor 
the class width decreased, no matter how many cases might occur 
in the sample. 

Any business which sells size merchandise such as men's shirts, 
gloves, shoes, etc., can utilize this type of analysis in controlling 
its purchases and inventory. From his past records of regular sales 
(not including year-end or clearance sales) a merchant can prepare 
frequency distributions and histograms of sizes of merchandise as 
guides to his next year's purchases 01 to the maintenance of his regu- 
lar stocks. The same distributions and graphs would not be useful 
in other stores or for other neighborhoods, for the distribution of sizes 
sold by a particular merchant is peculiar to the characteristics of his 
clientele. The distributions will be different, depending upon age, 
nationality, economic status, occupations, and other characteristics of 
the people who purchase in each neighborhood or at specific stores. 

Adaptations of Frequency Graphs 

The Cumulative Graph Ogzve.-jFor some kinds of analysis and 
description the cumulative frequency distribution and curve (usually 
called ogive) 8 are of more value than the forms just described. A 
cumulative frequency distribution can be constructed from an ordinary 
frequency distribution by adding the frequencies of successive class 
intervals, beginning at the smallest (the largest) class of the distribution, 
and showing each of these successive totals as the number of cases 
which is smaller than (greater than) the value of the upper (lower) 
class limit at that point J Table 64, which contains two types of cumu- 
lative distributions constructed from Table 61, shows how they are 
derived. The frequencies are cumulated from the lower limit to the 
upper limit of the table in column 2, and from the upper limit to the 
lower limit in column 3. Table 65 is the presentation form for 
the results demonstrated in Table 64. The information obtainable 
from Table 65 is in more usable form than that provided by Table 61. 
For instance, without any kind of arithmetic treatment, it is imme- 
diately obvious from column 1 that more than half of the families 
in the sample (87) paid monthly rentals of less than $37.50, and from 

7 The even numbered sizes are used in a different classification, misses' dresses. 

8 The name ogive is an architectural term given to the rib of t pointed vault or gothic 
arch, which has the same shape as this type of curve. 



372 



BUSINESS STATISTICS 



TABLE 64 

CUMULATIVE FREQUENCY DISTRIBUTIONS OF MONTHLY RENTALS PAID 
BY 155 FAMILIES IN COLUMBUS, OHIO 



CLASS 


INTERVAL 


(1) 

FREQUENCY 


(2) 
CUMULATIVE 

FREQUENCY, 
LESS THAN 
UPPER LIMIT 


^ (3) 
CUMULATIVE 

FREQUENCY, 
LOWER LIMIT 
AND ABOVE 


$ 7.50 and under 


$17,50 


16 


16 


155 


17.50 and under 


27.50 


27 


43 


139 


27.50 and under 


37.50 


44 


87 


112 


37.50 and under 


47.50 


17 


104 


68 


47.50 and under 


57.50 


18 


122 


51 


57.50 and under 


67.50 


11 


133 


33 


67.50 and under 


77.50 


10 


143 


22 


77.50 and under 


87.50 


9 


152 


12 


87.50 and under 


97.50 


3 


155 


3 












Total 




155 







column 3 that about one-fifth of the group (33) paid rentals of $57.50 
per month or over. 

The curves of these two types of cumulative frequency distributions 
are shown in Figure 54. Curve L represents the "less than" distribu- 
tion and curve M the "more than" distribution. It should be noted 
that in Figure 52-B the midpoints of the class intervals were joined 
to form the frequency polygon, whereas in the ogives the end values 
are joined. In the "less than" ogive there are, for example, sixteen 
cases below $17.50; therefore the frequency 16 is plotted at the upper 
limit of the $7.50 to $17.50 class. Similarly the next frequency, 43, 
is plotted at $27.50, etc. 

There are two major characteristics of the ogive, the most important 
of which is the ease of interpolation which its use permits. In order 

TABLE 65 

NUMBER AND PER CENT OF FAMILIES IN COLUMBUS, OHIO, PAYING 
MORE THAN AND LESS THAN A SPECIFIED MONTHLY RENTAL 



RENTALS PAID 


FAMILIES 


RENTALS PAID 


FAMILIES 


(1) 

Number 


(2) 

Per 
Cent 


(3) 

Number 


(4) 

Per 
Cent 


Less than $17.50 


16 
43 
87 
104 
122 
133 
143 
152 
155 


10.3 
27.7 
56.1 
67.1 
78.7 
85.8 
92.3 
98.1 
100. 


$ 7.50 and more. . 
17.50 and more. 
27.50 and more. 
37.50 and more. 
47.50 and more. 
57.50 and more. . 
67.50 and more. . 
77.50 and more. 
87.50 and more. 


155 
139 
112 
68 
51 
33 
22 
12 
3 


100. 
897 
72.3 
43.9 
32.9 
21.3 
14.2 
7.7 
1.9 


Less than 27.50 


Less than 37.50 


Less than 47.50 


Less than 57.50 
Less than 67.50 


Less than 77.50 


Less than 87 50 


Less than 97 50 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



373 



FAMILIES 

[ABSOLUTE FREQUENCIES] 

155 



140 

120 

100 

80 

60 

40 

20 





FIGURE 54 
OGIVES: CUMULATIVE FREQUENCY DIAGRAM OF RENT DATA 

FAMILIES 



<++ CURVE L 

r LES5 THAN UPPER LIMiTS] 






* MCDJAN^j 



UMITS| 



100 



75 



50 



25 



750 1750 37.5O 37*0 47.50 5750 6^ 50 7750 87.50 97.50 
DOLLARS OF WT.NT PAID 



Data from Table 65. 

to determine the number of cases in which less than a given amount, 
say $40, is paid for rent, it is only necessary to make a vertical ruling 
from $40 on the horizontal scale to ogive L and then from this point 
of intersection make a horizontal ruling to the Y axis. This indicates 
a frequency of approximately 91 families who pay less than $40 per 
month. The second important characteristic is the slope of the ogive. 
Where the slope is steepest there is the greatest concentration of 
frequencies, and wherever it is less steep there are fewer frequencies. 
The cumulative distribution and the ogive are often presented on 
a percentage basis in practical work. To illustrate this usage, col- 
umns 2 and 4 have been included in Table 65. In Figure 54 the scale 
at the right has been so arranged that 100 per cent is in the same 
position as 155 on the left-hand scale. The ogives L and M represent 
respectively either columns 1 and 3 or columns 2 and 4 of Table 65. 
Information concerning the distribution of rents in the sample can 
be obtained by reading the left scale as previously indicated. The 
percentage scale is independent of the actual number of cases in the 



374 



BUSINESS STATISTICS 



sample. From it can be read facts concerning the distribution of rents 
in Columbus, on the assumption that the sample is representative 
of the entire city. Thus ogive L shows that 25 per cent of Columbus 
families pay monthly rents of not more than $26.50 and ogive M 
shows that 25 per cent pay at least $54.00. The point of intersection 
of the two ogives at a frequency of 50 per cent shows that half of the 
families pay more and half less than about $35.25 per month. 

The ogive has an additional use in the graphic determination of 
measures of central tendency and dispersion. In particular this diagram 
will be referred to in chapter XVII. 

Histogram of Unequal Classes. In the discussion of the principles 
for constructing frequency distributions provision was made for unequal 
class intervals under certain conditions. The graph of such a distribu- 
tion requires special explanation because the use of the methods 
previously described would produce definite misrepresentation.! 

TABLE 66 

HOURLY WAGE RATES PAID TO NEWLY HIRED EMPLOYEES BY 52 INDUSTRIAL CONCERNS 
IN BUFFALO, NEW YORK, IN 1940* 



DISTRIBUTION A 
EQUAL INTERVALS 


DISTRIBUTION B 
UNEQUAL INTERVALS 


Hourly Wage 
Rate in Cents 


(1) 

No. of 
Concerns 


Hourly Wage 
Rate in Cents 


(2) 

No. of 
Concerns 


(3) 

Heights of Columns 
Adjusted to Preserve 
Frequency Area in 
Unequal Intervals 


27.5-32.4 


2 
2 
2 
10 
24 
8 
2 

2 


27.5-37.4 


4 
12 
3 
5 
8 
6 
2 
10 
2 


2 
6 

15 
25 
40 
30 
10 
5 
1 


32.5-37.4 


37 5-47.4 


37.5-42.4 


47 5-48.4 


42.5-47.4 


48.5-49.4 


47.5-52.4 


49.5-50.4 


52.5-57.4 


50.5-51.4 


57.5-62.4 


51.5-52.4 


62.5-67.4 


52.5-62.4 


67.5-72.4 


62.5-72.4 


Total 




52 




52 





* Source confidential. 

A tabulation of hourly hiring rates in Buffalo is presented in 
Table 66. Distribution A is divided into equal five-cent class intervals. 
Nearly half of the cases fall in one class and as a result the table 
does not provide as much information as we should like to have 
from a frequency table concerning hiring rates. The effect of this 
concentration is even more evident in the histogram of Figure 5 5- A. 
The middle rectangle is so large in relation to the others that the 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



375 



comparison of frequencies by means of the areas of the several rec- 
tangles discloses little information beyond the fact, evident from the 
table, that the hiring rates are concentrated around 50 cents an hour. 

In order to learn more about hiring rates, Distribution B was pre- 
pared from the original source. Ten-cent intervals were used below 
47.5 cents and above 52.5 cents, but the middle class was subdivided 
into 5 one-cent intervals to provide additional detail concerning the 
24 cases in that class. That is, unequal class intervals were introduced 
as a means of increasing the usefulness of the table. 

This distribution is plotted in Figure 55-B in the usual way, i.e., 
with the heights of the rectangles of the histogram representing the 
frequencies just as written in column 2 of the table. The appearance 
of the graph is sufficient evidence of its inaccuracy. The difficulty lies 
in the fact that the areas allotted to the frequencies in the several 
classes do not correspond to the equivalent areas in Figure 5 5- A. The 
contrast can be followed in a tabulation of the two graphs (Table 67). 

The areas within the several classes are not the same for A and B 
nor are the two total areas. The last two columns indicate, however, 

TABLE 67 

AREAS OF CORRESPONDING RECTANGLES OF FIGURE 55 
(Frequency X width of class interval) 



CLASS INTERVALS 


(1) 

FIGURE 55-A 


(2) 

FIGURE 55-B 


(3) 

FIGURE 55-C 


27 5-37 4 


12X5 = 10) Q 
12X5 = lop 

12X5 = 10) 6Q 
(10 X 5 = 50}- 60 

24 X 5 =120 

(8X5 = 40) 
J2X5 = 10f- 50 

(0X5= 0) Q 
1 2 X 5 = 10J- 10 


4X 10 

12 X 10 

3X1 = 3 
5X1 = 5 
8X1=8 
6X1=6 
|2 X 1 = 2. 

10 X 10 
2X 10 


= 40 
= 120 

= 24 

= 100 

= 20 


2 X 10 =20 

6X10 =60 

15 X 1 = 151 
25 X 1 = 25| 
40 X 1 = 40|-- i M) 


*7 V-47 4 ... 


47 ^ 52 4 


52 5-62 4 


30 X 1 = 30) 
[10 X 1 = 10J 

5 X 10 =50 
1 X 10 =10 


62 5-72 4 


Total Area 


260 


304 


260 



exactly what should be done to Distribution B in order to obtain a 
graph whose area will be comparable with the graph of Distribution A. 
The areas of rectangles on class intervals that have been increased from 
five cents to ten cents in Figure 55-B have been doubled, and the areas 



PKiUKB 53 

FREQUENCY DIAGRAMS OF HOURLY WAGE RATES PAID BY FIFTY-TWO 
INDUSTRIAL CONCERNS 

NUMBER OF 
CONCERNS 



EQUAL INTERVALS 



20 
15 
10 



I I ' 



27.5O 32.5O 3750 425O 47.5O 525O 575O 6250 675O 72.5O 



15 

10 

5 

O 

4O 

35 

30 

25 

20 

15 

10 

5 



Bl 
' UNEQUAL INTERVALS- INCORRECT 


- 






i 


fc 










i 



27.50 



3750 



4750 52.50 



62 5O 



72.5O 



UNEQUAL INTERVALS 
CORRECT 



1 



27. 5O 
Data from Tables 66 and 67. 



37 5O 4750 52 5O 62.5O 
RATES IN CENTS 



72.5O 



FREQUENCY DISTRIBUTIONS AND GRAPHS 377 

of rectangles on the middle class interval which has been subdivided 
into 5 one-cent intervals total one-fifth of the former amount. There- 
fore, dividing the frequencies of the first, second, eighth, and ninth 
classes by 2, (i.e., multiplying by ,; ), and multiplying each of the 
intervening classes by 5 (i.e., by 5), gives the set of heights of 
columns adjusted to compare with frequencies of even five-cent inter- 
vals. These adjusted heights, as listed in column 3 of Table 66, are 
used for the histogram shown in Figure 55-C. The areas of this 
diagram agree with those in Figure 5 5- A (shown in Table 67). 
The first two rectangles of A are identical with the first rectangle 
of C The areas of the third and fourth rectangles of A are equivalent 
to the area of the second rectangle of C. Similar computations show 
the equivalence of all the corresponding rectangles in the two diagrams. 
Thus in C the areas of the total and the individual classes respectively 
have been preserved. At the same time additional information has 
been presented concerning the large number of hiring rates in the 
middle class without sacrificing any essential facts relative to the rates 
in the other class intervals. 

/ The general rule for adjusting the frequencies for use in a diagram 9 
or a distribution containing unequal class intervals may be stated as 
follows: divide the actual frequency of each class by the width of 
the class to obtain unit frequencies; multiply these unit frequencies 
by the width of the equal class intervals of another distribution with 
which they are to be compared. If no comparison is involved the unit 
frequencies themselves may be plotted or any constant multiple of them./ 

Comparison of Two Distributions. In applied statistical work as 
well as in more advanced analysis occasions arise which require that 
two distributions be represented on the same graph. The purpose is 
to show the relation between the contours of the two curves and the 
positions of measures descriptive of the two distributions. Polygons 
should be used for comparison because the rectangles of histograms 
would overlap making clear-cut representation impossible. 

In comparing two polygons certain requirements must be met. The 
class intervals of the two distributions must be the same, and all of 
the intervals must have the same width. Percentage frequencies must 
be used to give comparable areas. Two distributions that are to be 



9 When the class intervals are unequal the transfer from the histogram to the polygon 
is not justified because in joining the midpoints of adjacent columns the triangular areas 
added and subtracted are not equivalent. 



378 



BUSINESS STATISTICS 



compared must be brought into conformity with these rules before 
the graph is planned. 1 

TABLE 68 

NUMBER AND PERCENTAGE DISTRIBUTION OF 500 FAMILIES IN BUFFALO, NEW YORK, 

ACCORDING TO VALUE OF MONTHLY RENTALS PAID 

(Approximated from reports of real estate dealers) 





FAlf 


IUXS 




Number 


Percentage 
Disttibution 


$ 7.50 and less than $17.50 


93 


18.6 


17.50 and less than 27.50 


167 


33.4 


27.50 and less than 37.50 


127 


25.4 


37.50 and less than 47.50 


59 


11.8 


47.50 and less than 57.50 


30 


6.0 


57.50 and less than 67.50 


12 


2.4 


67.50 and less than 77.50 


7 


1.4 


77.50 and less than 87.50 


4 


.8 


87.50 and less than 97.50 


1 


.2 








Total 


500 


100. 



Figure 56 contains two polygons representing comparable material. 
One is a reproduction of Figure 52-B in per cents, showing the rentals 
paid in Columbus, and the other is a diagram of the data in Table 68, 
showing the distribution of rentals in Buffalo. The Buffalo distribution 
is only approximate because the reports of real estate offices may 
contain some duplication. In the main, however, the sample is a 
representative cross-section of rents in Buffalo. 

FIGURE 56 

PER CENT COMPARISON OF Two DISTRIBUTIONS OF RENT DATA 
Data from Tables 61 and 68. 



PERCENT 
OF FAMILIES 



PERCENT 
OF FAMILIES 



oo 
30 


BUFFALO 


J^ 
30 


25 


/ )\ 


25 


20 


1 / \ 


20 


L5 


I / ^ 


15 




/ / \^ 




10 


1 / ^ \COLUMBUS 


10 


5 


^/ ^ ^ \ 


5 


O 


/ r"~~"*,^ , '""-^ . 


n 



250 125O 2250 3250 4250 5250 6250 725O 8250 9250 10250 
DOLLARS OF RENT PAID 



FREQUENCY DISTRIBUTIONS AND GRAPHS 379 

Comparison of the two polygons shows that the general level of 
rents is lower in Buffalo than in Columbus. The Buffalo curve is some- 
what smoother since it is based on a larger sample. Regardless of 
the difference in size of sample there is a noticeably greater concen- 
tration of rents in Buffalo in the classes below $47.50, and correspond- 
ingly a much smaller proportion in the higher rent brackets. 

The higher rent level in Columbus is presumably an expression 
of an increase in demand for housing due to expanding personnel 
of the state government unaccompanied by an equivalent growth in 
housing facilities. If this explanation is correct, the difference in con- 
ditions is presumably temporary. If other basic causes are present, 
further investigation might uncover a permanent differential in the 
rent levels of the two cities. 

Types of Curves 

/One branch of advanced statistics deals solely with the various 
types of frequency curves and the development of measures used in 
analyzing them. The subject is introduced here in order to acquaint 
the student with the graphic appearance of the types of curves most 
frequently encountered in practical work. 

Any description of the different types of curves centers around 
the "normal" or "bell-shaped" curve. It is a portrayal of the distribu- 
tion of an infinite number of identically obtained measurements of 
a fixed object. 'That is, if an extremely large number of persons, all 
equally skillful, all possessed of normal vision and all using the same 
minutely graduated measuring device, were to measure the width of a 
room, the results would vary above and below the true width of the 
room as indicated by the normal curve. The curve is, therefore, a 
picture of the variations due to pure chance. But in practice pure 
chance is usually mingled with other uncontrolled causes of variation 
to such an extent that a normal distribution is seldom found outside 
the laboratory. Yet many of the distributions with which we deal differ 
so little from the normal curve that its characteristics are transferable, 
and in addition the normal curve serves as a guide to the description 
of other types of curves; 

In Figure 57 six other curves are presented with the normal curve. 
The two curves in B are symmetrical like the normal curve but 
one is flatter and the other more sharply peaked. The flat-topped 
curve would result from a distribution in which extraneous factors 



380 



BUSINESS STATISTICS 

FIGURE 57 
TYPES OP CURVES 




had produced more variability than would arise from pure chance 
The peaked curve would result from a distribution in which extraneous 
factors tended to offset natural chance variability. C and D are 
skewed curves depicting distributions in which a controlling factor 
enters more strongly on one side of the peak than on the other side. 
These curves have a "long side" and a "short side." Methods of 
interpreting these are explained under the subject of skewness in 
chapter XVIII. E and F are less usual but are types that are encoun- 
tered occasionally. 



FREQUENCY DISTRIBUTIONS AND GRAPHS 381 

Detailed analysis of these curves belongs in more advanced statistical 
work, but some of the properties of the normal curve and its use in 
the development of the principles of reliability of samples will be the 
subject matter of chapters XXVIII and XXIX. Ability to recognize the 
several types is essential to an understanding of the chapters dealing 
with averages and dispersion. 

The Lorenz Curve 

A special type of diagram used to show the nature of the concen- 
tration of cases in one or more frequency distributions is known as 
the Lorenz Curve. 10 The method of preparing data for presentation 
in a Lorenz Curve can be explained best from an example. Table 69 
gives the number of independent retail grocery stores operating in 
Buffalo in 1929 and 1935 classified according to size as measured 
by sales. 

Column 3 is obtained by changing column 2 to per cents and in 
column 4 these per cents are cumulated. Column 7 is the result of three 
steps: (a) multiply the midpoint, column 1, of each class by the num- 
ber of stores, column 2, in the class to obtain the total sales by stores 
of that size, column 5; (b) express each of these products as a per- 
centage of the total sales of all stores, column 6; (c) cumulate these 
per cents. A parallel procedure using the frequencies for 1935 leads 
to the cumulative per cents of the lower part of the table. 

In plotting the points in Figure 58, the cumulative per cents of 
stores, column 4, are located from the base scale and the cumulative 
per cents of sales, column 7, from the vertical scale. Each curve 
therefore begins at the lower left-hand corner of the diagram and 
ends at the upper right-hand corner. 11 If all of the stores had equal 
sales, then any 10 per cent of the stores would have 10 per cent of 
the sales volume, any 20 per cent would have 20 per cent of the sales 
volume and so on, and the plotted points would fall on the diagonal 
line of the diagram. Hence this diagonal is known as the line of equal 
distribution. The departure of the actual curves from this line shows 
the extent of the concentration of sales volume in the larger-sized 

10 This curve is named after M. O. Lorenz, who developed it and employed it mainly 
in his studies of wealth. See M. O. Lorenz, "Methods of Measuring the Concentration 
of Wealth," Journal of the American Statistical Association, New Series No. 70 (June, 
1905), Pp. 209-19. 

11 The base scale is sometimes arranged in reverse order, from right to left, so that 
the curves will extend from the right of the base scale to the top of the vertical scale. 



382 



BUSINESS STATISTICS 



s 



8' 
S 



r 



H ls 

Pw 



20 



Is 



w 



H 

Q 

o 
z 



29- 

2 



a/^ 
csj 



(3) 

PERCENTAGE 
ISTRIBUTION 
OF STORKS 

FROM (2) 



a a 
S 3gg 



C 2 



^T \0 M 06 ^ 6 xf vo <N 



6 
o\o 



OOOO 

oooo 



oooooooooooors 

OOOOOOOOOOOOcr 
v>(rsrvvMrMrNv^r>ir>^OO-i 



oooooooooooo 
oooooooooooo 
oooooooooooo 



oooooooooo< 
OOQOOOOOOO< 
O^O^O^OO^O^O^O O^O N O i < 
AO <so*ir\o*r\o"^o"o"< 



2oooooooo 
OoOOOOOOo 
trMrko^^omoOirN 



^1 \0 O *fV OS A M ^H -l r-l ^l 



S8i83 

^(rNiA^^irNirvirxirvmO^O^O^ 
M |C CN hf rf r^ 1 rT rf rT t^^o^o 



OOOOOOOOOOOO 

oooooooooooo 
oooooooooooo 












8 

o 



Sf 



ffi 2 

O 

o 
S y 

3 
0, C 

K 
H| 



1 



Jllllllllllls 



oS 

^6 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



383 



FIGURE 38 

LORBNZ CURVES: CUMULATIVE PER CENTS OP STORES AND SALES, INDEPENDENT RETAIL 
GROCERY STORES IN BUFFALO, 1929 AND 1935 

PER CENT OF 
TOTAL SALES 



100 



90 



80 



100 



90 




10 



10 20 30 

Data from Table 69. 



40 50 60 70 

PER CENT OF STORES 



80 



90 100 



stores. The greater distance of the 1935 curve from the diagonal line 
shows the growth of concentration between 1929 and 1935. 

The Lorenz Curve is valuable, both in analysis and in presentation 
whenever distribution according to two quantitative attributes is of 
importance. There has been frequent use of this graph during recent 
years when distribution of business and income have been under 
discussion. 

PROBLEMS 

1. a) What are the advantages of Table 57, page 352, of the text as com- 

pared with Table 56, page 351? 
b) Describe exactly how you would obtain Table 57 from Table 56. 



384 



BUSINESS STATISTICS 



2. What information concerning rentals can be obtained from Figure 49, 
page 353? from Figure 50, page 354? 

3. a) Name the four principles that must be observed in planning a frequency 

distribution. 
b) State the main points to be considered in applying each principle. 

4. Indicate which of the following are correct statements and amend any that 
are incorrect: 

a) The presence of artificial grouping in an array can be disregarded in 
preparing a frequency distribution. 

b) All frequency distributions should have at least ten class intervals. 

c) Class intervals can be of equal or unequal width at the convenience of 
the person preparing a distribution. 

d) Class limits should be established so that the average value of the items 
included in each interval is approximately equal to the class mark of the 
interval. 

e) In preparing a distribution of continuous data the only way to designate 
class limits is by writing the class marks. 

/) The following is a discrete distribution: 



500,000 up 
300,000 up 
et 

5. State wherein eacJ 
construction of a 

A 


to 1,000,000 


9 


to 500 000 


if> 


c. 

i of the following meets or fails to meet the principles of 
frequency distribution. 

t B 


INCOME 


AVERAGE 
MONTHLY RENT 


AGE (YEARS) 


No. OP PERSONS 


All * ***** 


6,930,446 
535,600 
100,398 
577,284 
575,300 
1,287,625 
1,345,984 


Under $ 500.. 
$ 500 to 700 . . 
700 to 1,000. . 
1,000 to 1,200. 
1,250 to 1,500. . 
etc. 


$25.90 
22.90 
22.80 
26.00 
28.10 


All ages 
Under 5 


Under 1 


5 to 9 


10 to 14 


15 to 24 


25 to 34 


C 


etc. 


D 


TYPBOF 
DWELLING 


No. OF FAMILIES 
PROVIDED FOR 


EARNING OVER $4,000 
YEAR OF 


One-family 


4,620 
159 
1,195 


GRADUATION 
Per Cent Per Cent 
of Class of Group 


Two-family 


Multi-family 


1935 2 18 


Total 


5.974 


1936 5 26 



FREQUENCY DISTRIBUTIONS AND GRAPHS 



385 



6. a) Describe the construction of a histogram ; a frequency polygon. 

b) What is the relation of the two types of diagram to discrete and con- 
tinuous data? 

7. For what kinds of information is the ogive preferable to the ordinary dis- 
tribution? 

8. a) Under what circumstances is it desirable to use unequal class intervals? 
b) Explain with an example of your own the method of preserving areas in 

a diagram of a frequency distribution with unequal class intervals. 

9. a) What is the reason for using percentage frequencies in comparing two 

distributions? 

b) In what situation would percentage frequencies be unnecessary for com- 
paring two distributions ? 

10. a) Make a frequency table, using the 112 items in the 4 columns assigned 
to you from the following table. (See numbered assignments at the 
top of page 386.) 

b) Give reasons for your choice of class limits and width of class intervals. 

c ) Draw a graph showing your frequency distribution. 

d) What information concerning wages of semi-skilled female workers in 
this hosiery mill can be derived from your table and graph? 

WEEKLY EARNINGS OF 168 SEMI-SKILLED FEMALE WORKERS, IN HOSIERY MILL XYZ * 



(a) 


(b) 


M 


(d) 


(e) 


(/) 


15.20 


1800 


11.20 


1600 


2000 


13.60 


11.60 


14.00 


12.00 


11.30 


12.20 


12.00 


8.00 


12.00 


17.60 


15.60 


8.50 


8.00 


1280 


12.80 


9.50 


12.00 


14.50 


10.00 


14.00 


11.80 


12.00 


1060 


16.00 


12.60 


6.40 


9.20 


14.00 


12.00 


12.60 


14.00 


12.00 


7.60 


12.00 


15.00 


12.00 


6.50 


12.40 


14.80 


8.20 


6.00 


8.00 


16.00 


24.00 


18.00 


28.00 


8.00 


19.00 


14.00 


14.60 


16.80 


16.80 


16.00 


22.00 


1460 


9.00 


14.20 


14.40 


17.20 


15.20 


19.20 


16.50 


12.00 


21.20 


14.40 


10.00 


12.30 


2000 


12.00 


20.00 


12.50 


14.00 


11.60 


18.00 


21.00 


23.00 


2000 


1600 


1640 


14.10 


8.00 


14.00 


18.80 


16.40 


16.00 


22.50 


16.00 


16.10 


12.00 


12.00 


2000 


12.00 


24.00 


19.90 


12.00 


23.80 


21.40 


20.80 


19.60 


12.90 


8.40 


2840 


24.00 


16.00 


27.00 


2400 


23.50 


17.30 


28.80 


18.00 


20.00 


16.00 


2000 


18.00 


15 20 


7.20 


10.40 


800 


21.60 


14.00 


25.00 


14.00 


15.50 


11.80 


2440 


11.40 


12.00 


26.00 


21 80 


1500 


14.00 


24.50 


20.40 


16.00 


14.00 


16.00 


16.20 


6.00 


17.60 


16.00 


6.00 


12.40 


28.00 


20.00 


8.80 


12.00 


16.00 


18.40 


16.90 


16.00 


16.00 


19.40 


12.40 


15.50 


13.00 


12.00 


18.00 


10.00 


16.00 


6.00 


14.00 


13.20 


12.00 



> Based on similar data appearing in a 1939 tone of the Monthly Labor Review. 



386 BUSINESS STATISTICS 

Assignments 
No. Columns No. Columns No. Columns 



\abcd 6 a b * f 11 b c d 

2 a b c e 1 a c d 9 12 b c d f 

3 a b c f 8 a c d / 13 beef 

4 a b d e 9 a c f f 14 b d e \ 

5 a b d f 10 a d f 15 c d * j 

REFERENCES 

CHADDOCK, ROBERT E., Principles and Methods of Statistics. Boston: Houghlon 
Mifflin Co., 1925. 

Chapter V deserves a careful reading, particularly the location of partition 
values, pp. 61-^5. 

ELDERTON, W. PALIN, and ELDERTON, ETHEL M., Primer of Statistics. London: 
A. and C Black, Ltd., 1920. 

Chapters I-IV give an extremely simple statement of the fundamentals of 
the preparation of frequency distributions and the meaning of measures of 
central tendency and dispersion. 

MUDGETT, BRUCE D., Statistical Tables and Graphs. Boston: Houghton Mifflin 
Co., 1930. 

The explanation of "Graphs of Frequency Distributions" on pp. 102-21 
warrants careful reading. 

TRELOAR, ALAN E., Elements of Stathlicitl Reasoning. New York- John Wiley 
and Sons, 1939. 

A dear explanation of the difference between discrete and continuous dis- 
tributions and adjustment for intervals of different width, pp. 26-35. 

YULE, G. UDNY, and KENDALL, M. G., Introduction to the Theory of Statistics. 
London: Charles Griffin and Company, Ltd., 1937. 

The basic principles of the construction of frequency distributions and 
graphs are presented in chapter VI. 



CHAPTER XVI 

MEASURES OF CENTRAL TENDENCY AVERAGES OF 

CALCULATION 

INTRODUCTION 

THE whole process of statistical analysis is characterized by the 
attempt to reduce the details of masses of data and to develop 
summary figures. The initial stages in this analysis have already 
been pointed out: a statistical table classifies masses of separate items 
into a small number of comparable groups; a graph is planned to con- 
centrate attention on one or a few major characteristics of a set of data; 
a ratio involves the substitution of one simple figure for two or more 
unwieldy ones; and a frequency distribution condenses a long list of 
separate items into usable form by substituting class values for indi- 
vidual values. The statistician's working equipment must include a 
knowledge of these various descriptive devices. Another basic tool 
needed in analysis is the average. An average is frequently described 
as a "measure of central tendency" because it provides a single sum- 
mary figure by means of which an entire set of data may be represented. 

Measures of central tendency are familiar to statisticians and laymen 
alike in such examples as average weekly wages, average prices of 
securities, average daily temperature, a man of average height, a 
medium-sized house, and the usual amount of rainfall. Familiarity, 
however, tends to obscure the fact that several different concepts of 
"average" are involved in these examples and that different methods of 
computation are employed in obtaining them. It follows, therefore, 
that several types of summary figures or averages must be explained 
in developing the subject. 

Measures of central tendency fall into two groups: (1) those ob- 
tained by calculation, and (2) those defined by position. Each group 
contains two fundamental averages that have sufficient practical appli- 
cation to warrant explanation in this book. They are, 

Averages of Calculation Averages of Position 

Arithmetic Average Median 

Geometric Average Mode 

387 



388 BUSINESS STATISTICS 

The remainder of this chapter will be devoted to the arithmetic 
average and the geometric average. The averages of position and 
criteria for evaluating the four measures of central tendency will be 
described in the following chapter. 

THE ARITHMETIC AVERAGE 

The Average of Ungrouped Data 

The measure of central tendency most commonly known and recog- 
nized is the arithmetic average, which is frequently called the arithmetic 
mean or, more simply, the mean. It is calculated by adding together 
all the items in a group or series, and dividing their sum by the number 
of items. For instance, the arithmetic average of a student's examina- 
tion grades is calculated by adding the grades of all the examinations 
and dividing their sum by the number of examinations. Table 70 illus- 
trates the method. 

TABLE 70 
CALCULATION OF ARITHMETIC AVERAGE OF EXAMINATION GRADES 

First examination 75 

Second examination 93 

Fourth examination 88 

Third examination 87 

Fifth examination 93 



Total, 5 examinations 436 

Average = 436 -J- 5 = 87.2 

There are several characteristics of this simple problem which should 
be emphasized because they apply to all arithmetic averages. First, all 
five items are included in the total, which is divided by 5 to obtain 
the average. Second, a change in any one of the examination grades 
will affect the value of the average; the average of grades would be 
increased 5 points by changing the grade of 75 to 100. All students 
have without doubt made this kind of calculation by assuming different 
examination grades, before and after an examination. Third, extreme 
values, either high or low, may produce a value of the average which 
is not representative of the data. Unusual values have the greatest 
effect when the average is based on a small number of items. 

This method of calculating the arithmetic average can be applied 
to the sample of rentals paid by families in Columbus as arrayed in 
Table 57, page 352. The sum of all the 155 monthly rentals is 



MEASURES OF CENTRAL TENDENCY 



389 



determined to be $6,307. This total divided by 155 gives the arithmetic 
average, $40.69.* If each of the 155 families in the group had paid a 
monthly rental equal to the arithmetic average, $40.69, the total amount 
would be the same as was actually paid. The arithmetic average, then, 
is the value which can be substituted for every actual value in a group 
without changing the sum of all the values. 

The Weighted Average and Weighted Total 

There are many cases in which values to be averaged or totaled are 
of different degrees of importance. When this is the situation, it is 
necessary to weight the separate items by multiplying each by a factor 
which represents its relative importance in the group. 

Weighted Average. The problem of weights usually arises in the 
calculation of course grades. For example, suppose that the fifth grade, 
93, in Table 70, was for a final examination, and consequently was 
twice as important as any other examination grade received. It would 
be multiplied by 2 and the total divided by 6 (the sum of the weights) 
as indicated in Table 71. The weighted average is one grade point 
larger than the unweighted average. In some cases the weighting might 
cause a much greater difference than one grade point, and it might 
cause the average to increase (as in this case) or to decrease. 

TABLE 71 
CALCULATION OF WEIGHTED AVERAGE OF EXAMINATION GRADES 





EXAMINATION 
GRADE 


WEIGHT 


GRADE X WEIGHT 


First 


75 


1 


75 


Second 


93 


1 


93 


Third 


87 


1 


87 


Fourth 


88 


1 


88 


Final 


93 


2 


186 










Total 




6 


529 



Weighted arithmetic average = 529 -f- 6 88 2. 



1 A formula for the arithmetic average can be developed from this calculation, 

Arithmetic average = , 

N 

where 2 (sigma) stands for "the sum of"; X stands for any value of the variable, rent; 
and N represents the total number of items (that is, rentals). There are several com- 
monly used symbols for the arithmetic average which are employed under different 
circumstances. In elementary work it is usually represented by its initials, A.A., or by 
Af (Mean). The latter will be employed in this text. If several averages are being used, 
a subscript is added to M to designate the variable of which it is the average: e.g., M* 
for the arithmetic average of the X's, M 9 for the average of the y's, etc. When used in 
algebraic_manipulation the average is sometimes represented by its formula, or by the 

symbol. X. 



390 



BUSINESS STATISTICS 



The weighted average of examination grades alone does not pro- 
vide a complete basis for a course grade. For the latter, it is necessary 
to include laboratory work, classroom responses, and special assign- 
ments or reports, as well as examinations. To average the grades of 
these diverse elements, each of which is of different importance, it is 
again necessary to employ weights. Since each type of grade included 
might be a weighted average, like the examination grade above, the 
final course grade becomes a weighted average of weighted averages. 

A weighted average requires careful consideration of the items being 
averaged, in order to arrive at an equitable basis for establishing the 
weights. In the case of a course grade in statistics, after scrutinizing 
all elements involved, it may have been decided that they should have 
the measures of importance shown in column 1, Table 72. 

The per cents which represent the importance of each course element 
in the final grade may be called weights. The weighted average is 
obtained by (1) multiplying the value of each item by its weight, a 
measure of its importance in the total; (2) dividing the sum of these 
products by the sum of the weights. The method of calculating a 
weighted average to obtain a course grade for a student in elementary 
statistics is shown in Table 72. 



TABLE 72 

CALCULATION OF WEIGHTED AVERAGE OF ELEMENTS OF COURSE 
TO OBTAIN FINAL COURSE GRADE 



ELEMENT 


(l) 
WEIGHT 
(per cent) 


(2) 

AVERAGE 

GRADE 


(3) 
AVERAGE GRADE 
X WEIGHT 


Examinations 


60 


88 


5 280 


Laboratory work 


20 


80 


1 600 


Classroom response 


10 


90 


900 


Homework 


10 


70 


700 










Total 


100 




8.480 



Weighted average * = = 84.8 

* If the letter W stands for the weight and n stands for the number of items, the process can 
be described algebraically as: 



w . . t . Mwm _Xi^i 
Weighted average = - 



In summary form: Weighted average = 



The weighted average, 84.8, is greater than the simple arithmetic 
average (the sum of the grades in column 2 divided by 4), by more 
than 2 points. This increase of 2 points over the simple average is due 



MEASURES OF CENTRAL TENDENCY 391 

to the increased importance which is given to the high average exam- 
ination grade of 88 through the process of weighting. 

The weights of the several parts of the statistics course are shown 
in Table 72 as per cents, the sum of which is 100, representing the 
whole course. From elementary arithmetic it will be remembered that 
the numerator and denominator of a fraction can be multiplied or 
divided by the same number without changing the value of the fraction. 
Consequently, the absolute values of the weights can be replaced by 
relative values proportional to the absolute values. The relatives are 
easier to use and give the same result. 

In a study of wages of farm labor in Vermont for the period 1780- 
1937, a weighted average is used to calculate the annual average of 
day wage rates. '"The annual wage rates of labor hired by the day 
are weighted averages of the monthly data." 2 The weights assigned to 
the average daily wage rates of the several months are: 

January 5 May 8 September 8 

February 4 June 10 October 8 

March 5 July 18 November 6 

April 7 August 16 December 5 

Total 100 

Examination of these weights reveals that they are closely related both 
to the seasons and to the number of days in each month. The day wages 
in July, for instance, are of most importance because of the great 
amounts of farm labor hired in that month, the usual clemency of the 
weather, and the critical period for growing crops, as well as the num- 
ber of working days in the month. The day rates in July, then, should 
have the greatest influence in the determination of the annual average. 
Conversely, February rates should have the least influence. 

Effect of Weights. One might justifiably ask, "What are the effects 
of weighting on the simple average?" The answer may be generalized 
as follows: 

1. If the larger weights are applied to the smaller values, and the 
smaller weights to the larger values, the influence of the smaller values 
will be increased, and the value of the weighted average will be smaller 
than the value of the unweighted average. 

2. If the larger weights are associated with the larger values and 

*T. M. Adams, Prices Paid by farmers lor Goods and Services and Received by 
Them for Farm Products, 1790-1871; Wages of Farm Labor, 1780-1937, (A Preliminary 
Report from University of Vermont and State Agricultural College, Burlington, Vermont, 
February, 1939), pp. 43-44. 



392 



BUSINESS STATISTICS 



the smaller weights with the smaller values, the influence of the larger 
values will be increased and the resulting average will be larger than 
the unweighted average. 

3. If there is no relation between the sizes of the weightf'and the 
values of the items, that is, if large weights are as frequently assigned 
to small items as to large items and small weights are similarly dis- 
tributed, the difference between the weighted average and the un- 
weighted average is likely to be very small and entirely due to chance; 
in fact under these circumstances the two averages may be the same. 8 

The computation of a weighted average can be explained from 
Table 73. The per capita sales in column 4 are the results of dividing 
each sales figure of column 3 by the corresponding population in col- 
umn 1. The average per capita sales for the entire table is not obtained 
by averaging the figures in column 4 because the populations repre- 
sented by the cities in the several size groups are different. The proper 
method is to multiply each per capita figure by the population of that 
group, sum the products, and divide by the total population. The result 
of this operation is $97.79, as indicated. The same figure can be ob- 
tained by using as weights the percentage distribution of the population 

TABLE 73 

SALES OF RETAIL FOOD STORES (1935) AND POPULATION (1930) 
IN CITIES OF DIFFERENT SIZE * 



SIZE OF CITY 


(1) (2) 
POPULATION 


(3) 

RETAIL FOOD 
SALES 
(in millions) 


(4) 

PKH CAPITA 
SALES 


Number 
(in thousands) 


Percentage 
Distribution 


250000 and over 


28,785 
9,771 
19,784 
10,615 
23,375 


31.2 
10.6 
21.4 
11.5 
25.3 


$2,839.3 
941.5 
1,998.5 
1,175.2 
2,074.2 


$ 98.64 
96.36 
101.02 
110.71 
88.74 


75 000-250 000 


10,000- 75,000 


2 500- 10 000 


Under 2,500 


(excluding farms) 
Total 


92.330 


100. 


$9.028.7 


i 97.79 



' Population, United States Census, 1930; Food store sales. United States Census of Business, 
1935. 

in column 2, and dividing the sum of the products by 100. The average 
per capita sales can also be computed by dividing the total sales by 
the total population, i.e., $9,028,700,000 -f- 92,330,000 = $97.79. 

8 Cf E. C. Rhodes, Elementary Statistical Methods. (London: George Routledge & 
Sons, Ltd., 1933), pp. 143-45. 



MEASURES OF CENTRAL TENDENCY 393 

When all of the necessary information is available as in this table, 
the weighted average should be computed as the ratio of the two per- 
tinent totals. In most cases only the individual ratios will be given. 
Then weights must be found in order to compute the weighted average. 
The rule for determining these weights was stated in chapter XI in 
discussing the problem of averaging ratios. 4 That development shows 
that in averaging the per capita figures the denominators from the com- 
putation of the individual per capita figures must be used as weights. 
But the average per capita sales in each size of city multiplied by the 
population in that group gives the total sales in the group and the sum 
of these products is the total sales. Therefore, the two methods just 
described for computing the weighted average are identical and one 
or the other should be used according to the form in which the data 
are available. 

The use of the population column for weighting is required in order 
to retain in the weighted average a characteristic of the simple average, 
i.e., that the average times the number of items will give the total 
value of the original items. In the weighted average that rule becomes: 
the weighted average times the total of the absolute weights must 
equal the total value of the original items. In the example the average 
per capita sales times the total population equals the total sales. This 
characteristic will be referred to as the total value criterion. 

The unweighted average of column 4 is $99.10. The effect of the 
weights, therefore, has been to reduce the value of the average. This 
means that large weights tend to be associated with small per capita 
sales and small weights with large per capita sales. Survey of the table 
shows that, although the relation is somewhat mixed, the next to the 
largest weight is attached to the smallest per capita figure and the 
next to the smallest weight is attached to the largest per capita figure. 

Another use of the weighted arithmetic average is found in the 
computation of the percentage changes in retail sales of independent 
dealers in Ohio. In this case the purpose is to derive from sample data 
an average figure that will represent the universe. Regular reports of 
current retail sales are collected monthly from a large number of 
co-operating merchants representing many lines of retail trade. By 
classifying and tabulating these reports according to the lines of trade 
which they represent, it is possible to calculate the percentage changes 



* See Chapter XI, pp. 265-49. 



394 



BUSINESS STATISTICS 



from the previous month in retail sales for all independent dealers 
in Ohio, as shown in Table 74. 

TABLE 74 

CALCULATION OF PERCENTAGE CHANGE IN RETAIL SALES BY INDEPENDENT DEALERS 
IN OHIO, USING WEIGHTED AVERAGE OF RELATIVES DERIVED FROM SAMPLE 
REPORTS REPRESENTING VARIOUS LINES OF TRADE, SEPTEMBER, 
1939, DIVIDED BY AUGUST, 1939 



LINK OF TRADE 


(1) 

PERCENTAGE 
RELATIVES 
SEPT. 1939 
+ AUG. 1939* 
(X) 


(2) 
SALES OF EACH 
LINE 
AS RELATIVE OF 
TOTAL NET SALES 
1935 CENSUS 
(W) 


(3) 

WEIGHTED 
PERCENTAGE 
RELATIVES 
(1) X(2) 
(XW) 


Grocery without meats 


116 89 


0384 


449 


Grocery with meats 


110 00 


1620 


1782 


Country general 


102 88 


0264 


2 72 


Department stores 


125 45 


1367 


17 15 


Men's and boys' clothing 


126 67 


0277 


3 51 


Family clothing 


106 88 


0112 


1.20 


Women's ready-to-wear 


123 54 


0260 


3 21 


Shoes 


147 16 


0125 


1.84 


Motor vehicles 


86 73 


2031 


17 62 


Gasoline filling stations 


98 95 


0971 


9.61 


Furniture stores . 


100 51 


0380 


3 82 


Household appliances 


122 43 


0123 


1 51 


Radio stores 


108 30 


0026 


.28 


Lumber and building material .... 
Heating and plumbing 


95.71 
119 09 


.0297 
0050 


2.84 
.60 


Hardware . ... 


107 16 


0376 


4 03 


Restaurants 


105 47 


0784 


8.27 


Drugs 


100 32 


0381 


3.82 


Florists 


111 29 


0055 


.61 


Jewelry . 


108 54 


0118 


1 28 










Total 




1 000 


106 23 


Weighted percentage change. . . . 






+6.23 


Unweighted percentage changed* 


+15 18 















* Unpublished data. 

t The unweighted percentage change is obtained by using the unweighted totals of reports 
from all lines of trade. The use of a weighting system with a total of 1.0 is explained in 
chapter XII. 

It would be an easy matter to add the sales reported in all lines for 
the respective periods. The percentage change would then be the ratio 
between the two totals less 100 per cent. But this method, though 
simple, involves a basic error. The total sales of the reporting stores 
in each line of trade do not bear the same proportionate relationship 
to the total of all sales reported as the total value of these same lines 
of trade hold to the actual total value of all retail trade. This dispro- 
portionate relationship is due to the voluntary co-operative arrangement 
by which the collection process is carried on. In order to overcome 
this difficulty weights have been introduced. The percentage relatives, 



MEASURES OF CENTRAL TENDENCY 395 

column 1, calculated from the total of the reported sales of each line 
of trade, are multiplied by a number, column 2, representing the pro- 
portion which sales by independent dealers in each of the lines of retail 
trade constituted of the total of retail sales by independent dealers in 
Ohio as reported in the Census of Business of 1935. 5 The sum of 
these products, column 3, becomes the basis for calculating the 
weighted percentage change. The effect of weighting is very important 
here, for the weighted change shows an increase of 6.23 per cent, 
whereas the unweighted change shows an increase of 15.18 per cent. 

In this example the weights are not the denominators of the ratios 
to which the weights are applied. As a result, the weighted average 
times the sum of the absolute weights (the total sales upon which the 
relatives in column 2 are based) will not give the total value (in this 
case total sales reported in September, 1939). This departure from the 
established criterion is justified by the purpose of the computation, 
which is to estimate from the reported sample the percentage relation 
in a universe. The figure 106.23 is the best obtainable estimate of the 
percentage relation between the sales in September and August of all 
independent retail outlets in Ohio in the included lines of trade. That 
is, if it were possible to know the sales of all such stores in the state 
in August, the result of multiplying that figure by 1.0623 would ap- 
proach the actual sales of these stores in September. This extension 
of the weighted average is frequently involved in estimating the con- 
ditions of a universe from those found in a sample. 

Choice of Weights. The problem of selecting weights does not 
arise so long as the total value criterion is adhered to. Two types of 
cases must be considered in which the criterion is abandoned. In both 
of these the question of choice of weights is a necessary preliminary 
step to the computation of the average. The principles involved in 
the choice are discussed in general terms here. A more specific discus- 
sion of the weighting problem in the construction of index numbers is 
presented in chapter XIX. 

The first type of case arises when the conditions of a universe are 
to be inferred from a sample. The computation of the percentage 
change in retail trade in Ohio presented in the preceding section is an 

5 It should be observed that the weights in this case represent the fractions which the 
annual volumes of the separate lines constituted of the total annual volume in 1935. 
To the extent that the different retail lines are differently affected by seasonal influences, 
these weights introduce errors. It is felt, however, that the error thus introduced is less 
than that due to lack of representative sampling. 



396 BUSINESS STATISTICS 

example. The weights in this case represented the relative importance 
of the several lines of trade and were applied to percentage relations 
in the sample. This is a standard method of weighting to transfer from 
sample to universe, but by no means a universal one. Data in different 
form may require a different method of weighting. Therefore, no rule 
can be offered except the broad one that the weights must be so selected 
that the characteristics of the universe will be correctly inferred. 

The second type of case appears when the true weights required 
to preserve the total value criterion are unavailable but some systematic 
weighting is an obvious necessity. The choice lies between approximat- 
ing the missing true weights and adopting an arbitrary weighting sys- 
tem. The first alternative is employed in computing the average farm 
price of wheat at a particular time. A large number of individual prices 
are reported from various parts of the country but no corresponding 
reports of the amounts sold at various prices are available. Skillful 
estimators supply the missing quantity weights on the basis of whatever 
auxiliary knowledge can be gleaned from sources at their command. 
The result is a weighted average farm price closely approximating the 
correct average price. There is another reason for the success of this 
procedure in the hands of practiced estimators. Small variations in 
the weighting system will have comparatively little effect on the value 
of the average. For this reason Bowley stated as a general rule, "In 
calculating averages give all care to making the items free from bias, 
and do not strain after exactness of weighting/' fl 

The second alternative, arbitrary weighting, is employed whenever 
an approximation to the true weighting system is not feasible. An 
example is found in the weights suggested on page 391 for determining 
an average monthly wage rate for farm labor. The weights attached 
to the several months are a composite of several elements and, since 
the total value criterion has been abandoned, the accuracy of the 
weighted result is largely dependent upon the judgment of the one 
who established the weighting system. The full meaning of judgment 
weighting is brought out in chapter XXV in connection with the con- 
struction of an index of business conditions from component series of 
unequal importance. 

Weighted Total. In some cases the weighted average is not as 
useful as the weighted total, i.e., 2 (WX). For the most part this 

Arthur L. Bowley, Elements of Staff stiff (London: P. S. King and Son, Ltd., 1920, 
fourth edition), p. 94. 



MEASURES OF CENTRAL TENDENCY 



397 



applies when some standard unit of measurement exists independent 
of the particular investigation, and comparisons must be made in terms 
of that unit. For example, in calculating the cost of living of families 
for use in the administration of unemployment relief, the problem of 
family size arises at once. The number of persons per family is not 
a sufficiently accurate standard. Families of five persons, for instance, 
are not all equivalent, for the ages and sexes of the members of a 
family are very important in determining food requirements as well 
as in estimating clothing needs and housing costs. One approach to 
the solution of the problem of calculating food requirements which 
was presented by a group of experts under the auspices of the League 
of Nations involved assigning weights to persons according to their 
age and sex. A value of 1 was given to a male between 14 and 59 
years of age and all other ages were assigned values relative to this. 
The scale of weights which was developed follows: 



AGE AMD SKX 


WEIGHT 


Under 2 years, male or female 


2 


2- 3 years, male or female 


.3 


4 5 years, male or female 


4 


6 7 years, male or female 


.5 


8 9 years, male or female 


.6 


1011 years, male or female 


.7 


12-13 years, male or female 


.8 


1459 years, male 


1.0 


14-59 years, female 


.8 


60 years and over, male or female 


8 







In order to obtain the weighted total of food units required for a 
family, it is necessary to assign the proper weight to the number of 
family members according to age and sex, and add the products. For 
instance, food units required for a family of five members, father aged 
40, mother aged 35, one son aged 15, and two daughters, one aged 10 
and one 11, is calculated as follows: 



MEM BE* 


NUMBER 


WEIGHT 


TOTAL UKITI 


Father 40 


1 male 


1.0 


1.0 


Mother 35 


1 female 


.8 


.8 


Son 15 


1 male 


1.0 


1.0 


Daughters 10 and 11 


2 females 


.7 


14 










Total 


5 members 




4.2 











The weighted total of 4.2 represents the total number of food units 
required for the family in question. This is an average of .84 food 



398 



BUSINESS STATISTICS 



units per person for this family. Since the natural unit for relief is 
the family and not the individual, the weighted total is in this case 
a more useful figure than the weighted average. It should be observed 
that this system of weighting did not differentiate food requirements 
of the different sexes under 14 years of age or 60 years and over. 

The Average of a Frequency Distribution 

Frequency distributions are so commonly used in analyzing and 
describing various kinds of business data that it is necessary to examine 
the methods by which an arithmetic average can be calculated from 
such a distribution. The method is very similar to that employed in 
obtaining a weighted average. Each midpoint of a class interval is 
multiplied by the frequency of that interval. The sum of the products 
of midpoints multiplied by the frequencies represents the total value 
of all items in the distribution. When this total is divided by the 
sum of the frequencies, the result is the arithmetic average of the 
distribution. 

Frequency distributions should be constructed so that the class marks 
represent all the values included in a class interval. Although this 
standard for construction may not always be attained, the class limits 
should be so established that in each class the midpoint of the class 
interval is approximately equal to the average of the actual values of 
the items. Each midpoint will therefore represent all the values in 
the class. 7 

TABLE 75 

CALCULATION OF THE ARITHMETIC AVERAGE FROM THE FREQUENCY DISTRIBUTION 
OF RENTALS PAID BY 155 FAMILIES IN COLUMBUS, OHIO 



CLASS INTERVAL 
(dollars) 


(1) 
CLASS 

MIDPOINT 
X 


(2) 

FREQUENCY 


(3) 
FRFQUKNCY X 
MIDPOINT 
fX 


7 *>0 and under 17 50 


12 50 


16 


200 00 


17 50 and under 27 50 


22 50 


27 


607 50 


27 50 and under 37 50 


32 50 


44 


1 430 00 


37 50 and under 47 50 . ... 


42 50 


17 


722 50 


47 50 and under 57 50 


52 50 


18 


945 00 


57 50 and under 67 50 


62 50 


11 


687.50 


67 50 and under 77 50 


72 50 


10 


725.00 


77 50 and under 87 50 .... 


82 50 


9 


742.50 


87.50 and under 97-50 


92.50 


3 


277.50 










Total 




155 


6,337.50 











M = 6337.50 = $40.89 
155 



7 See chapter XV for a complete discussion of the characteristics of frequency distri- 
butions which will affect the calculation of arithmetic averages. 



MEASURES OF CENTRAL TENDENCY 399 

Direct Calculation. The arithmetic average of the rent distribution 
which was constructed according to this principle is calculated in Table 
75 to illustrate the method. Since $12.50, the midpoint of the first 
class interval, $7.50 to $17.50, is assumed to represent the average 
rental paid by the 16 families whose rentals fall within the interval, 
the total amount paid by the 16 families should be $12.50 multiplied 
by 16, or $200.00. Likewise, each product in column 3 represents the 
total rental paid by the families in that class interval of rentals. The 
total of column 3, $6,337.50, should therefore be approximately equal 
to the total amount paid for rent by all families in the sample. The 
arithmetic average is found to be $40.89 by dividing this total by 155, 
the number of families. 8 

It will be observed at once that $40.89 differs slightly from the figure 
that was obtained by the computation of the arithmetic average of the 
original data on page 389, and the computed total rentals paid, 
$6,337.50, is likewise a different total. A quesion arises at once as to 
which average or total ought to be used. Obviously the computations 
from the original data are more precise, but whether they should there- 
fore be used will depend upon the purpose. The availability of the 
original data may also be a determining factor. On many occasions 
data are available only in frequency distributions so that there is no 
question as to which method of calculation to employ. 

If the average and total were being used by a rental office in con- 
nection with its accounting records, the values of each separate rental 
would be on file and the simple arithmetic total and average would 
probably be computed from them. On the other hand, if the average 
of this sample is to be used to represent the average rent paid by all 



8 It is frequently helpful to be able to describe the calculation of the average from 
the frequency distribution by a formula. Such a formula can easily be developed from 
the computation just completed. The procedure was as follows: Arithmetic average = 
(frequency of first class interval X midpoint of first class interval) -j- (frequency of 
second class interval X midpoint of second class interval) + etc., divided by the sum 
of the frequencies. Using the symbols at the head of each column in Table 75, the 
formula for the arithmetic average becomes: 



_ 



X is used to denote the values of the midpoints of class intervals in a frequency distribu- 
tion just as X would be used to denote the individual values if the data were ungrouped 
If a student understands the procedure in the calculation of the arithmetic mean, he 
need not memorize this formula. Note, however, that the denominator is always the total 
number of frequencies, not the number of classes in the distribution. 



400 BUSINESS STATISTICS 

families in the city, calculation from the frequency distribution is pref- 
erable. 9 

The frequency distribution is a device for summarizing data and 
for reducing the amount of work involved in calculating statistical 
measures. When, as in the case of the rentals, a relatively small num- 
ber of items is included in the distribution, differences may occur in 
the measures of central tendency calculated from the grouped and 
ungrouped data. As the size of the sample increases, however, differ- 
ences in these calculated values tend to disappear. 

Short-Cut Calculation. The direct method of computing the arith- 
metic average from a frequency distribution is not an involved process, 
but the actual steps of multiplication, particularly in large distributions, 
may become a real burden. The number of computations can be reduced 
by the use of short-cut methods and as a consequence the chance for 
errors will be decreased. Although at a first glance, or upon an initial 
trial, these methods may not appear shorter, a little practice will con- 
vince anyone except an arithmetic wizard that much time and labor 
can be saved by employing them. They also lay the foundation for a 
much greater saving in more advanced calculations. 

Method 1 : The arithmetic average of rentals from the sample of 
155 families is calculated by short-cut method 1 in Table 76. 10 

In carrying out the illustrative computations the first step is to select 
one of the midpoints 11 as an assumed average. Before calculation, of 
course, the average is not known, but for illustrative computation A the 

See page 368 for discussion of the use of frequency distributions of sample data 
for representing the characteristics of a larger universe. 

10 This text employs a fixed notation, the basis of which is as follows: 

1. Capital letters (X, Y ) are used to denote values of variables measured 

from zero, e.g., the miapoints of class intervals in column 1, Table 76. 

2. Small letters (x, y ) are used to denote values of variables measured from 

the average, e.g., the differences of the midpoints of the rent classes from the average 
rent, $40.89. 

3. The letter, d, is used to denote values of variables measured from an assumed 
average, e.g., columns 3 and 5, Table 76. 

4. Primes and subscript letters will be used to distinguish variables that are being 
compared, e.g., d and d' in columns 3 and 5 to indicate deviations from two different 
assumed averages, and d. to indicate deviations in steps in Table 77, column 3. 

5. The measures of central tendency will be designated as follows: 

M = true arithmetic average 
M' =. assumed arithmetic average 
Me = median 
Mo = mode 
G.M. = geometric average. 

6. Additions to the notation will be made in subsequent chapters as the need arises. 

11 Any value can be used as the assumed average, but it has become customary to 
use the midpoint of a class interval because it affords easiest computation. 



MEASURES OF CENTRAL TENDENCY 



401 



TABLE 76 

SHORT-CUT METHOD 1 FOR COMPUTING THE ARITHMETIC AVERAGE 

FROM THE FREQUENCY DISTRIBUTION OF RENTALS PAID BY 

155 FAMILIES IN COLUMBUS, OHIO 









ILLUSTRATIVE 


ILLUSTRATIVE 








COMPUTATION A 


COMPUTATION B 




(1) 


(2) 


(3) 


(4) 


<5) 


(6) 








Dollar 




Dollar 




CLASS INTERVAL 
(dollars) 


CLASS 

MID- 
POINT 


FRE- 
QUENCY 


Deviation 
of Midpoint 
from Assumed 
Average 


Frequency 
X 
Deviation 
(2) X (3) 


Deviation 
of Midpoint 
from Assumed 
Average 


Frequency 

Deviation 
(2) X (5) 








of $42.50 




of $22.50 






X 


/ 


d 


fd 


d' 


fd' 


7.50 and under 17.50. . 


12.50 


16 


30 


480 


10 


-160 


17.50 and under 27.50. 


22.50 


27 


20 


540 








27.50 and under 37.50. 


32.50 


44 


10 


440 


+ 10 


+440 


37.50 and under 47.50. 


42.50 


17 








+20 


+340 


47.50 and under 57.50. 


52.50 


18 


+ 10 


+ 180 


+30 


+540 


57.50 and under 67.50. . 


62.50 


11 


+20 


+220 


+40 


+440 


67.50 and under 77.50. 


72.50 


10 


+30 


+ 300 


+50 


+500 


77.50 and under 87.50. 


82.50 


9 


+40 


+ 360 


+60 


+540 


87.50 and under 97.50. 


92.50 


3 


+ 50 


+ 150 


+70 


+210 


Total 




155 




250 


... 


+2,850 



Illustrative Computation A: 

Af' = 42.50 

M =42.50+ (-250-7- 155) 

= 42.30- (+250- 155) 

= 42.50 1.61 

= $40.89 



Illustrative Computation B: 

M' 22.50 

M =22.50 + (2,850 -h 155) 

= 22.50 + 18.39 

= $40.89 



midpoint $42.50 is chosen as the assumed average. The interval for 
which the midpoint is $32.50 is $10 less than the assumed average, and 
so is shown in column 3 of Table 76 as deviating from the assumed 
average by $10; the midpoint $22.50 is shown as $20 less than 
$42.50; the midpoint at $52.50 is shown as $10 more than the assumed 
average, etc. These differences are called actual dollar deviations of 
the midpoints from the assumed average, and are shown in order in 
column 3. The deviations are multiplied by their respective frequencies, 
and the products, retaining the algebraic signs of the deviations, are 
shown in column 4. These products are the amounts of difference 
between the total rentals actually paid in each class and the rentals 
that would have been paid if everyone had paid a rental equal to the 
assumed average. For instance, the 16 families in the first class interval 
actually paid $480 less than they would have paid if each of the 16 
families had paid $42.50 per month in rent. The net total of column 4 



402 BUSINESS STATISTICS 

indicates that the whole group of 155 families actually paid $250 less 
in rent than they would have paid if everyone had paid $42.50, the 
amount of the assumed average. With this information at hand, the 
arithmetic average can be computed as follows: Arithmetic average 
assumed average + the net difference divided equally among all the 
items included (prorated net difference), i.e., 

M = 42.50 -f ( 250 ~- 155) = 42.50 1.61 = $40.89 

This is the same average that was obtained by the direct method in 
Table 75. Illustrative computation B, at the right of Table 76, columns 
5 and 6, shows that the assumed average can be taken at a different 
midpoint with identical results. 

Following the same procedure as was employed in the direct method, 
a formula for short-cut method 1 can be developed: 



Method 2: This method is a modification of method 1. The same 
deviations in actual amounts are used, but instead of being taken as 
actual values they are counted as equal ''steps" of deviation from the 
assumed average. For instance, in this case each step is defined as equal 
to $10. The calculation is then made as in Table 77. 

The width of the class interval is conveniently chosen as the step 
in distributions having equal class intervals. The midpoint $42.50 is 
again chosen as the assumed average. Each $10 of deviation of a class 
interval midpoint from the assumed average is considered as one step, 
as in column 3. The multiplication of frequencies by these step devia- 
tions is done just as it was in the former illustration. For instance, the 
sum of the products of the frequencies and step deviations, column 4, 
indicates that the 155 families taken all together paid the equivalent 
of 25 steps (each $10 wide) less in rent than they would have paid 
if each rental had been equal to the assumed average. The value of 
these 25 steps must now be prorated among the 155 rentals paid and 
reduced to dollar figures. The calculation is as shown below Table 77. 
The formula for this method can be written as: 



in which /' the width of the step (usually the class interval) ex- 
pressed in the original units. The chief advantage of this method over 
short-cut method 1 is that the multiplications are so reduced in size 



MEASURES OF CENTRAL TENDENCY 



403 



that they can usually be performed mentally. In computing the average, 
the final multiplication of the prorated net difference by the width 
of the step must never be overlooked. 

Frequency Distributions with Unequal Classes or Open Ends. On 
some occasions, it is necessary to compute an arithmetic average from 
a frequency distribution in which the class intervals are unequal in 
width or which contains open intervals at either end. An open-end 
frequency distribution is one in which the lower limiting value or the 

TABLE 77 

SHORT-CUT METHOD 2 FOR COMPUTING THE ARITHMETIC AVERAGE BY STEP DEVIATIONS 

FROM THE FREQUENCY DISTRIBUTION OF RENTALS PAID BY 

155 FAMILIES IN COLUMBUS, OHIO 





(1) 


(2) 


(3) 


(4) 


CLASS INTERVAL 

(dollars) 


CLASS 
MIDPOINT 


FRE- 
QUENCY 


DEVIATION 
IN STEPS FROM 
ASSUMED 


FREQUENCY X 
DEVIATION 

IN STEPg 








AVERAGE 


(2) X (3) 




X 


/ 


d. 


f*. 


7.50 and under 17.50. . 


12.50 


16 


-3 


-48 


17.50 and under 27.50. 


22.50 


27 


2 


-54 


27.50 and under 37.50. 


32.50 


44 


1 


44 


37.50 and under 47.50. 


42.50 


17 








47.50 and under 57.50. 


52.50 


18 


+ 1 


4-18 


57.50 and under 67.50. 


62.50 


11 


4-2 


4-22 


67.50 and under 77.50. 


72.50 


10 


+ 3 


4-30 


77.50 and under 87.50. 


82.50 


9 


4-4 


4-36 


87.50 and under 97.50. 


92.50 


3 


4-5 


4-15 


Total 




155 




25 













M' = 42.50; /= 10.00 

M = 42.50 4- (25 ~ 155) 10 

= 42.50+ (.161)10 

= 42.50 1.61 

= $40.89 

upper limiting value, or both, is not indicated. Although open ends 
should be avoided whenever possible, if it is felt that such a class is 
necessary the total value of all the items in the class, or their average 
value, or the value of each individual item should be given in a footnote 
to aid in the description and analysis of the distribution. Unless infor- 
mation is supplied in one of these three forms, it becomes impossible 
satisfactorily to calculate the arithmetic average of the distribution, 
because the data required in the computation are not all available. 

Table 78 illustrates an open-end frequency distribution that also 
contains unequal class intervals. The computation of the arithmetic 
average of this kind of distribution is shown in the table. The unequal 
class intervals in this case may be due to the administrative requirements 
of the association. 



404 



BUSINESS STATISTICS 



The arithmetic-average purchase can be calculated directly by divid- 
ing the total sales by the number of purchases, just as was done in the 
distribution with equal class intervals. Care must be exercised, however, 
to use the correct midpoints of the classes. Short-cut method 1 may 
save time in calculation, and, as shown in columns 4 and 5 of Table 78, 
can be applied in this type of distribution just as easily as in one with 
equal class intervals. 

The step method is not commonly used in distributions with unequal 
classes. If it were employed in this calculation, columns 1, 2 and 4 
would be unchanged. From column 4 it would be apparent that the 
width of the step should be $5. The step deviations would read 11, 
7, 0, + 11, + 32, and + 61. The computation would be, 



M = $65.00- ($5) = $60.48. 



TABLE 78 

METHOD OF CALCULATING THE AVERAGE VALUE OF PURCHASES OF ACTIVE PATRONS OP A 
CO-OPERATIVE ASSOCIATION, JANUARY 1 TO DECEMBER 31, 1937* 



VALUE or PURCHASES 
(dollars) 


NUMBER OF 
PUR- 
CHASERS 
(FRE- 
QUENCY) 


(2) 

MID- 
POINT 

X 


^ 
FRE- 
QUENCY 

MID- 
POINT 
fX 


SHORT-CUT METHOD 


r, ( - 4) - 
Deviation 

from 
Assumed 
Average, 
$65.00 


(S) 

Frequency 
X 
Deviation 

fd 


0.00 and under 20.00. . 
20.00 and under 40.00. . 
40.00 and under 90.00 
90.00 and under 150.00.. 
150.00 and under 300.00. . 
300 00 and over 


248 
140 
202 
74 
49 
11 


10.00 
30.00 
65.00 
120.00 
225.00 
370.00f 


2,480 
4,200 
13,130 
8,880 
11,025 
4,070 


- 55 
- 35 

+ 55 
+160 
+305 


-13,640 
4,900 

+ 4,070 
+ 7,840 
+ 3,355 


Total 


724 




43,785 




3,275 



724 

= $60.48 = 65.00 4.52 
= $60.48 

* From unpublished business reports of the association. 

t Average value obtained by dividing total sales in the interval by the number of purchases. 

The problem of the open-end distribution can be solved as indicated 
whenever the total of the open-end class is known, as it is in this case, 
or when the average value for the class is provided. Too frequently in 
published data neither of these values is known, so that, without em- 
ploying dangerous assumptions, it becomes impossible to calculate the 
arithmetic average. Distributions of this type are common in many 



MEASURES OF CENTRAL TENDENCY 405 

kinds of census data. In such instances, it is necessary to employ one 
of the other measures of central tendency that depend upon position 
and do not make use of extreme values. These measures are described 
in chapter XVII. 

THE GEOMETRIC AVERAGE 

Definition 

The geometric average is a measure of central tendency which, like 
the arithmetic average, depends upon the values of all the items in the 
group. It is formally defined as the positive value of the nth root of 
the product of n positive items. This definition may sound very for- 
bidding, but the method of computation is relatively simple. In symbols, 
it becomes: 

Geometric mean = \/Xi X X 2 X X X 

Following the definition it is only necessary to extract the nth root 
of the product of the n items included. This work is greatly facilitated 
by the use of logarithms, the geometric average being simply the anti- 
logarithm of the arithmetic mean of the logarithms of all the items. 

The Average of Ungrouped Data 

The use of logarithms in computing the geometric average is illus- 
trated in Table 79. The logarithms come directly from the table in 
Appendix C. The sum of the logarithms divided by the number of 
items gives the logarithm of the geometric average, and the anti- 
logarithm is the geometric average. As shown at the bottom of the 
table the geometric average of these five numbers is 59.3, whereas the 
arithmetic average is 177.8. The arithmetic average is three times as 
great as the geometric average. This difference is the result of the 
greater importance of large values in the arithmetic average, which 
exceeds four of the five items. The geometric, on the other hand, has 
three items below it and two above. The example is intended to bring 
out the more representative character of the geometric as an average 
of values that are scattered as much as those in the table. 

The question of representativeness is related to the properties of the 
two averages. The arithmetic average is so located that the sum of 
the deviations of the individual values from it will be zero. One value 
that greatly exceeds the others provides a large deviation that offsets 



406 BUSINESS STATISTICS 

TABLE 79 

COMPUTATION OF THE GEOMETRIC AVERAGE OF FIVE NUMBERS 

NUMBEII Loci OF NUMBERS 

6 77815 

22 1.34242 

50 1.69897 

175 2.24304 

636 2.80346 



5)889 5)8.86604 

log(7JVf. = 1.77321 
M= 177.8 G.M. == 59.32 



many small ones near the point of concentration of the data. Thus 
whenever a few exceptionally large values are included in the set, the 
arithmetic average will exceed the values of a majority of the individual 
items. 

The corresponding property of the geometric average is: the product 
of the ratios of the individual items to the average equals unity. In this 
computation an item one tenth as large as the average offsets an item 
ten times as large. For example, in Table 79 for the first item, 6, 
the ratio to the average is .101, and for the last item? 636, the ratio 
is 10.7. Therefore the two offset each other almost exactly in the 
computation of the average. 

In general, the geometric average should be used when a few large 
items destroy the representativeness of the arithmetic average. This situ- 
ation is particularly likely to arise in averaging ratios when most of 
them fall close to the lower limit of the available range, and a few 
have much higher values. The geometric average can frequently be 
employed to advantage in measuring average rates of change from one 
time period to another. 

The Average of a Frequency Distribution 

The geometric average can be computed from a frequency distribu- 
tion by a method very similar to that employed in calculating the arith- 
metic average. It is necessary to remember the basic assumptions of the 
grouping of data in a frequency distribution: that all items in each 
class interval are evenly distributed throughout the interval, and that 
for each interval a single value must be selected which is representative 
of all the values in the interval or which is equivalent to the average 
of the values of the items in the interval. The midpoint of each inter- 



MEASURES OF CENTRAL TENDENCY 407 

val, which is equal to the arithmetic average of the class limits, was 
assumed to be the arithmetic mean of the values included in the interval. 
To be consistent in calculating the geometric average of a frequency 
distribution, the geometric average of the class limits should be used 
for this purpose, but the additional work involved in carrying out these 
calculations is not justified by the improvement in the results. As a 
consequence the class marks are assumed to be the geometric averages 
of the items in the several classes. 

A formula for finding the geometric average of a frequency distribu- 
tion can be developed from the corresponding arithmetic average for- 
mula by substituting logarithms of values for direct use of values, i.e., 

log GM . = 



Z/ 

in which X stands for the class marks and / for the corresponding 
frequencies. The anti-logarithm of the results obtained by performing 
the operations indicated on the right side of the equation is the 
geometric average. 

The steps in the process aie illustrated in Table 80 by the compu- 
tation of the average of the price relatives of 771 of the commodities 
included in the Bureau of Labor Statistics Index of Wholesale Prices. 
The relatives express the change produced in the United States price 
system by the outbreak of war in Europe. The average increase is 6.0 
per cent. The significance of this change in a period of 30 days might 
be overlooked until one reflects that the prices in the table represent 
transactions approximating twenty billions of dollars monthly. Hence 
an increase of 6.0 per cent in price would add 1| billions of dollars 
to the exchange value of goods. The arithmetic average increase in 
the price relatives is 6.5 per cent. The difference between the two aver- 
ages is far from negligible in terms of the increase in exchange value 
of goods. 

In a distribution of ratios such as this, the arithmetic average places 
undue emphasis on the ratios above the peak of the distribution. This 
characteristic will be referred to in the discussion of index numbers 
as the inherent upward bias of the arithmetic average. A biased result 
can be avoided by the use of the geometric average. When distributions 
are approximately normal, however, this advantage of the geometric 
average tends to disappear because in such cases there will be very 
little difference between the two averages. 



408 



BUSINESS STATISTICS 



TABLE 80 

CALCULATION OF GEOMETRIC AVERAGE FROM FREQUENCY DISTRIBUTION OF RELATIVES. 

WHOLESALE PRICE CHANGES IN 771 COMMODITIES, FROM AUGUST 

TO SEPTEMBER, 1939* 



PRICE RELATIVES 
(per cents) 
SEPT. 1939 -j- 
Auc. 1939 


No. OF 

COMMODITIES 

/ 


CLASS 

MARKS 
(per cents) 
X 


LOCX 


f LOGX 


Less than 94 5 


4+ 






7 66673 


94.5- 99.5 


ty 
1 


O7 


1 98677 


31 78832 


99 5-100.5 


*7fl 


100 


2 00000 


756 00000 


100 5-105.5 


117 


in* 


2 01284 


235 50228 


105.5-110.5 


-j< 


108 


2 0**42 


152 50650 


110.5-115.5 


60 




2 05308 


123 18480 


115.5-120.5 


*7 


118 


2 07188 


76 65956 


120.5-125.5 


2* 


12* 


2 08991 


48 06793 


125 5-130.5 


22 


128 


2 10721 


46 35862 


130.5-135.5 




l** 


2 12385 


31 85775 


135.5-140.5 


g 


1*8 


2 13988 


17 11904 


140.5-145.5 


3 


14* 


2 15534 


17 24272 


145.5-150.5 


2 


148 


2 17026 


4 34052 


150.5-155.5 


4 


153 


2 18469 


8 73876 


155.5 and over 


2 






44l635 






. . 






Total 


771 






1561.44988 













i s~ AH 1561.44988 
log G.M = 

= 2.02523 
GM. = 106.0 per cent 

Prepared from Monthly Release, Average Wholesale Prices and Index Numbers of Indi- 
vidual Commodities, United States Bureau of Labor Statistics, August and September, 1939. 
t 94, 92, 88, 61. 
t 161, 162. 
8 Sum of the logarithms of the individual relatives in this class. 

Characteristics of the Geometric Average 

Every value in a group of data must be included in the calculation 
of the geometric average and hence the value of this measure of central 
tendency cannot be influenced by individual judgment factors. In this 
respect, the geometric average and the arithmetic average are similar. 
They differ chiefly because in the geometric average small values have 
a greater effect than large ones, whereas the reverse is true in the arith- 
metic average. Consequently the arithmetic average is always larger 
than the geometric average. 

The geometric average can be employed only when all the values 
are positive, and none is zero. In view of this restriction, a geometric 
average percentage of profits and losses, for instance, cannot be cal- 
culated, for such data include plus and minus values. And a geometric 
average of percentage decreases and increases can be calculated only 
by expressing them ?s percentage relatives. 



MEASURES OF CENTRAL TENDENCY 



409 



The geometric average is not so easy to understand as the arithmetic 
average. Although the principle of the geometric average appears 
clear enough, its meaning is not easy to comprehend ; computation and 
practice in interpretation are essential before it becomes a readily avail- 
able tool in statistical analysis. The length of the computation and 
the lack of easily understood properties have been important factors 
in restricting its use as a measure of central tendency. 

PROBLEMS 

1. Explain the difference between an unweighted and a weighted arithmetic 
average. 

2. Six averages are computed from a set of values for variables, X, and a set 
of weights, W, as follows: 



X 


IF 


XXIF 


X 


IF 


XXIF 


X 


IF 


XXIF 


7 


10 


70 


24 


10 


240 


5 


1 


5 


24 


1 


24 


12 


2 


24 


15 


10 


150 


5 


12 


60 


7 


4 


28 


24 


12 


288 


15 


2 


30 


5 


12 


60 


7 


2 


14 


12 


4 


48 


15 


1 


15 


12 


4 


48 


5)63 


29 


)232 


5)63 


29 


)367 


5)63 


29 


)505 


12.6 




8 


12.6 




12.7 


12.6 




17.4 



What is the relation of these averages to the discussion on pages 391-92? 

3. One year ago an investor owned the following stocks and received the 
annual dividend returns as stated: 



STOCK 


INVFSTMENT 


DIVIDEND 
RETURN 


A 


$ 5 000 


$300 


B 


12 000 


480 


C 


2 000 


160 








Total 


$19 000 


$940 









Average return 4.95 per cent 
Today his stock holdings are as follows: 



STOCK 


INVESTMENT 


DIVIDEND 
RETURN 


A 


$ 8,000 


$480 


B 


6,000 


240 


C 


5,000 


400 








Total 


$19,000 


$1,120 









Average return 5.89 per cent 



410 BUSINESS STATISTICS 

a) How are the average rates of return obtained? 

b) Inasmuch as none of the individual dividend rates has changed during 
the year, how do you explain the increase in the average return? 

4. Compute the weighted average percentage of change in retail sales in the 
following lines of trade in Ohio in September, 1939, compared with 
August, 1939 (data from Table 74, page 394). 



Grocery with meats 
Department stores 
Motor vehicles 



Gasoline filling stations 

Restaurants 

Drugs 



5. In accordance with the list on page 397 of the text compute the number 
of food units required for the following family: 

AGE 

Husband 32 

Wife 28 

Grandmother 60 

Son 5 

Daughter 1 

6. On a single graph, plot the following two distributions of earnings in 
Hosiery Mill XYZ: 



WEEKLY EARNINGS 


SEMI-SKILLE 


n EMPLOYEES 


(dollars) 


Male 


Female 


6.00 and under 10.00 


o 


21 


10.00 and under 14.00 


33 


45 


14.00 and under 18.00 


91 


56 


18.00 and under 22 00 


122 


28 


22.00 and under 26.00 


74 


12 


26.00 and under 30 00 


24 


6 


30.00 and under 34.00 


4 





34.00 and under 38.00 


2 











Total 


350 


168 



7. a) Using data in Problem 6, find the average weekly earnings of (1) semi- 

skilled males, or of (2) semi-skilled females, using three different meth- 
ods of computation. Indicate all computations. 

b) From the shape of the distribution in Problem 6 and the value of the 
average in (1) or (2), whichever was assigned to you, state the 
characteristics of the distribution of earnings of either male or female 
hosiery workers. 

8. a) (1) What is the arithmetic average income of the total families 



MEASURES OF CENTRAL TENDENCY 



411 



(3) Does the difference between the two averages indicate what the 
average annual cost of owning an automobile ought to be? 
Discuss. 

b) (1) Compute the percentage of families owning automobiles in each 
income group. 

(2) Compute the average of the percentages in (1). 

(3) 36,500 is what per cent of 68,200? 

(4) Does the result of either (2) or (3) give the percentage of 
families in this distribution that own automobiles? Explain. 

CAR OWNERSHIP BY U. S. FAMILIES HAVING INCOMES LESS THAN $5,000 BY 
INCOME GROUPS, DATA FROM A SURVEY IN 1933 IN 18 CITIES, BY 
THE U. S. DEPARTMENT OF COMMERCE 



INCOME GROUP 
(dollars) 


TOTAL 
No OF FAMILIES 
RFPORTING 


No. OF FAMILIES 
OWNING 
CARS 


0_ 499 


19400 


5,800 


500- 999 


15,800 


7,200 


1 000-1 499 


13 700 


8600 


1 500-1 999 


9 300 


6700 


2 000-2 999 


7 000 


5,600 


3,000-4 999 


3 000 


2 600 








Total 


68,200 


36 500 









9. The number of stores operated by each of eight retail variety chains in 
1936 was: 



CHAINS 



No. OF STOBES, 
Nov. 1936 m 



W. T. Grant & Co 477 

H. L. Green Co., Inc 134 

S. S. Kresge Co 731 

S. H. Kress & Co 235 

McCrory Stores Corp 194 

G. C. Murphy Co 194 

J. C. Penney Co 1,496 

F. W. Woolworth Co 1,995 

* Survey of Current Business, January, 1937. 

a) Compute the arithmetic average number of stores per chain. 

b) Compute the geometric average number of stores per chain. 

c) Explain why the geometric average is superior for these data. 

10. a) Using the data in columns (A), (B), (C), (D), and (E) of Table 55, 
page 294, compute an unweighted geometric average index for each 
year. 

b) Compare your results with the weighted arithmetic average index 
appearing at the lower right of the table. 

c) Discuss the merits of each index as a measure of the credit standing 
of a prospective borrower. 



412 



BUSINESS STATISTICS 



11. a) From the following table, compute the arithmetic average and the 
geometric average: 

PER CENT DISTRIBUTION OF INDUSTRIAL ESTABLISHMENTS IN THE UNITED 
STATES, ACCORDING TO VALUE OF PRODUCTS, 1925 





VALUE OF PRODUCTS 
(dollars) 


PERCENTAGE OF 
TOTAL ESTABLISHMENTS 


5 000- 


20,000 


30 


20 000- 


100 000 


37 


100 000- 


500 000 


22 


500 000- 


1 000 000 


5 


1 000 000- 


2 000 000 


6 








Total . . . 




100 









b) Which average is more representative for the data? Give reasons. 



CHAPTER XVII 

MEASURES OF CENTRAL TENDENCY AVERAGES 
OF POSITION 

THE preceding chapter was devoted to the discussion of those 
measures of central tendency that depend upon calculation 
processes. This chapter proceeds with the description of meas- 
ures that are determined by their positions in a given set of data and 
hence require the exercise of the computer's judgment. There are 
two commonly used measures of this type, (1) the median and 
(2) the mode. 

THE MEDIAN 

Whether in an array or a frequency distribution, the median is the 
value of the middle item. Expressed more precisely, it is the value 
at the point on either side of which there is an equal number of items. 
That is, when the number of items is uneven the median has the value 
of the middle item ; when the number is even it lies half way between 
the two items at the center. 

Location and Value of the Median in an Array 

For a given set of data the location of the value of the median will 
be at the same item or between the same two items, whether the value 
of each item is known separately or whether they are grouped in a 
frequency distribution. Slightly different procedures are needed in 
the two cases, however, to fix the location of the median item and 
to determine its value. A simple diagram will explain the reason for 
this difference. 

Suppose that we wish to find the middle point of a distance of 
5 miles. Obviously it is located at 2.5 miles, computed by taking 
f-, or % if N = the number of miles. But if we have 5 men, one 
located at the center of each mile, the middle man is not the 2.5th 
but the 3d. From Figure 59, it is clear that each man's number is .5 
more than the mileage at the point where he is standing. This is 
because the men are located and numbered at the center of each space, 
whereas each milestone is at the end of the space it measures. Thus 

413 



414 BUSINESS STATISTICS 

the 3d man stands at the 2.5th mile and the two medians coincide 
although they are designated differently. 

FIGURE 59 
LOCATION OF THE MEDIAN IN AN ARRAY 







MEDIAN MAN 












1 


2 


3 4 


5 


5 MEN | 


1 


1 1 


1 


5 MILES fe\ [ 


^ 


(il V dl C 


ll (Si 






MEDIAN DISTANCE 





Application of Formula. When individual items are arrayed they 
correspond to the 5 men in Figure 59. Therefore in order to find the 
middle item it is necessary to add .5 of an item to the fraction JT 
This is accomplished by adding 1 to the numerator. Thus the formula 
for locating the median item in an array is -*-. Its value is then 
simply that of the middle item or the average of the two central 
items. 

The usefulness of the formula in finding the middle one of a large 
number of items is illustrated in Table 81. The formula for the 

median item is: 

N + l 63 + 1 _ 
2~ = 2~~~ 33 

Counting from either end of the array, the value of the 33d wage, 
$53.35, is the median. 

If there had been 64 pay checks in the group (omitting the last 
one, $80.12) the solution by the formula would show the median 
to be the 32.5th item. It would then be a value half way between 
the 32d and 33d items and would be equal to one-half the sum 
(the arithmetic average) of the values of the 32d and 33d wages, i.e., 
($52.93 + $53.35)^ 2 = $53.14 = the median. This value ($53.14) 
is not the value of any single wage in the array, but is the value 
half way between the two central items and on each side of which 
there is an equal number of items. 

The Extended Median. It is obvious that a measure that is deter- 
mined by the value of a single item or the average of two items has 
some unreliability, especially if there is only a small number of items 
in a sample. Suppose that an array consists of the five items, 5, 7, 15, 
17, and 18; the median is 15. If two lower values, 2 and 3, are added, 
the median is 7 ; it has been reduced by 8 units, whereas if two higher 
values, 20 and 24, had been added instead, it would have been 



MEASURES OF CENTRAL TENDENCY 



413 



TABLE 81 

AKRAY OF WEEKLY WAGES OF 65 SKILLED RUBBER WORKERS 

IN A FACTORY IN MICHIGAN* 



$36.96 
38.84 
38.96 
41.12 
47.02 
47.99 
49.07 
49.29 
49.35 
49.43 
49.43 
49.43 
50.03 
51.16 
51.90 
51.92 



$51.92 
51.93 
51.96 
52.48 
52.62 
52.73 
52.73 
52.73 
52.73 
52.73 
52.83 
52.83 
52.83 
52.83 
52.92 
52.93 



$53.35(median) 
53.43 
53.58 
53.66 
53.73 
53.93 
54.32 
54.74 
55.52 
56.31 
56.43 
56.43 
56.43 
56.43 
56.43 
56.43 



$56.43 
58.34 
58.58 
59.13 
60.01 
60.12 
61.36 
62.69 
63.45 
68.62 
68.62 
71.34 
73.42 
73.49 
73.49 
78.82 
80.12 



' Confidential unpublished source. 



increased by only 2 units, to 17. In a small group therefore, unless 
the values are closely concentrated, the central item may be shifted 
so much by the addition of a very few items at either end that the 
median is too erratic to be depended upon. In such a case it is possible 
to resort to a more stable measure, the extended median. This is 
obtained by averaging the 3, 4, or 5 central items instead of taking 
a single item or the average of only two of them. Using the same 
example that was suggested above, the extended median, of 5, 7, 15 
17, and 18 is T + ^ + 17 = 13. Adding the two lower values, the extended 
median of 2, 3, 5, 7, 15, 17, and 18 is g + 7 8 + 15 -9; adding the two 
higher values, the extended median of 5, 7, 15, 17, 18, 20, and 24 is 
IB + IT + is _- jg Thus the extended median fluctuated 4 points on one 
side of the center and 3$ points on the other side when two more items 
were added at the ends of the series, whereas the ordinary median fluc- 
tuated 8 points on one side and 2 on the other. As the number of items 
averaged at the center is increased, the fluctuations will become smaller 
and more even on either side. 

This measure is particularly valuable in determining the seasonal 
pattern of a time series, a purpose for which the median value of each 
month over a period of years is very commonly used. Speaking of the 
use of the extended median for obtaining seasonal variations, Chris- 
topher Saunders, of the University of Manchester, has said: "The 
extended median is probably the best average to take, since it is not 
influenced by extreme values which may be due to exceptional circum- 



416 



BUSINESS STATISTICS 



stances, and at the same time is not, like the simple median, affected 
by the accident of the value in any single year." l 

Location and Value of the Median in a Frequency Distribution 

In order to explain why the determination of the median in a 
frequency distribution differs from that in an array, it is necessary 
to go back to Figure 59. The 5 miles correspond to the range of unit 
values of class intervals in a frequency distribution. But instead of 
having one item in the center of each mile, or interval, there may 
be any number of items. In the absence of detailed information to 
the contrary, these are assumed to be evenly distributed throughout 
the interval, each one being at the center of its own "item range." 
Figure 60 illustrates a very simple frequency distribution made up of 

FIGURE 60 
LOCATION OF THE MEDIAN IN A FREQUENCY DISTRIBUTION 



[A] 10 WAGE ITEMS 
OR FREQUENCIES'* 












2 


5 


3 


[B] ASSUMED DISTRIBU- 
TION OF ITEMS "" 
[Cl MEASURE OF 
ITEM RANGES ~" ( 



1 


2 


* 1 * 

34 


.1.1. 
5 6| 7 




8 


. 
9 1 10 




) 


234567 8 9 10 




MEDI 


AN - $56 




[D] MEASURE OF 40 42 44 
UNIT VALUES ~ 40 4 , 2 4 , 4 


1 

46 48 60 52 54 56 58 60 62 64 66 

11 1111 ill 


68 70 72 74 76 78 80 
i i i i i i 


[E] 3 CLASS INTERVALS- 


$40 TO $50 


$50 TO $60 




$60 TO |80 






WAGES 


NO OF EMPLOYEES 






$40-50 


2 






50-60 


5 






60-80 


3 








10 





10 weekly wages selected from Table 81 and grouped in three class 
intervals. If we knew the value of each of these 10 items, we could 
determine the location of the median from the formula for an array, 
v_+ 12^1 = 5.5, and its value would be half way between the 5th 
and 6th items (row B). But, since we know nothing except the 
class values and the number of items in each class, some other method 
must be found for determining an approximate value for the median. 
Application of Formula. The logical procedure is to interpolate 
from the assumed distribution of items in the class in which the 
median item falls to the unit values of that class. The median is still 
the value midway between the 5th and 6th items but, in order to 
interpolate on the diagram, the values of the items as well as of the 

1 Christopher Saunders, "Seasonal Variations in Employment," The Economic Journal, 
Vol. XLV, No. 178 (June 1935), P. 272. 



MEASURES OF CENTRAL TENDENCY 417 

class intervals must be measured along a scale. Hence we use the 
"milestones" (row C) that mark the ends of the "item ranges" 2 
instead of the numbers (row B) that count the items. The center 
of the 10 item ranges is therefore ~- = ^ = 5, and it will be seen 
from Figure 60 that this point in row C coincides with the point 
midway between the 5th and 6th items in row B. A line drawn from 
the end of the 5th item range intersects the scale of unit values 
(row D) at $56, which is therefore the value of the median wage 
for this group of 10 items. 

By computation, the class in which the median item will fall 
is determined by cumulating the frequencies until they exceed 
the value of -- . In this case -1=5. This exceeds 2, the num- 
ber of frequencies in the lowest class interval, but is less than 2 + 5, 
the sum of the frequencies in the first two classes; hence the median 
falls in the 2d class, $50 to $60. The proportionate distance of the 
median value from the lower limit of the median class interval will 
be the same as the number of its item range in the interval is to the 
total number of item ranges in that interval. To dete